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REGRESSION-DISCONTINUITY ANALYSIS: 
AN ALTERNATIVE TO THE EX POST FACTO EXPERIMENT! 


DONALD L. THISTLETHWAITE 
National Merit Scholarship Corporation 


While the term “‘ex post facto experi- 
ment” could refer to any analysis of 
records which provides a quasi-experi- 
mental test of a causal hypothesis, as 
described by Chapin (1938) and Green- 
wood (1945), it has come to indicate 
more specifically the mode of analysis 
in which two groups—an experimental 
and a control group—are selected 
through matching to yield a quasi- 
experimental comparison. In such 


studies the groups are presumed, as a 
result of matching, to have been equiv- 


alent prior to the exposure of the exper- 
imental group to some potentially 
change inducing event (the “experi- 
mental treatment”). If the groups dif- 
fer on subsequent measures and if 
there are no plausible rival hypotheses 
which might account for the differ- 
ences, it is inferred that the experi- 


1 This study is a part of the research pro- 
gram of the National Merit Scholarship 
Corporation. This research was supported 
by the National Science Foundation, the 
Old Dominion Foundation, and by Ford 
Foundation grants to the National Merit 
Scholarship Corporation. The participa- 
tion of the second author was made possible 
through the Northwestern University Car- 
negie Corporation Project in Psychology- 
Education. The mode of analysis illustrated 
in Figure 1 of this paper was first sug- 
gested by the second author, and will be 
presented in a chapter entitled ‘Experi- 
mental Designs in Research on Teaching’’ 
in the forthcoming NEA, AERA Handbook 
of Research on Teaching to be published by 
Rand MeNally and edited by N. L. Gage. 
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mental treatment has caused the 
observed differences. 

This paper has three purposes: first, 
it presents an alternative mode of 
analysis, called regression-discon- 
tinuity analysis, which we believe can 
be more confidently interpreted than 
the ex post facto design; second, it 
compares the results obtained when 
both modes of analysis are applied to 
the same data; and, third, it qualifies 
interpretations of the ex post facto 
study recently reported in this journal 
(Thistlethwaite, 1959). 

Two groups of near-winners in a 
national scholarship competition were 
matched on several background vari- 
ables in the previous study in order to 
study the motivational effect of public 
recognition. The results suggested that 
such recognition tends to increase the 
favorableness of attitudes toward 
intellectualism, the number of students 
planning to seek the MD or PhD 
degree, the number planning to become 
college teachers or scientific re- 
searchers, and the number who succeed 
in obtaining scholarships from other 
scholarship granting agencies. The 
regression-discontinuity analysis to be 
presented here confirms the effects 
upon success in winning scholarships 
from other donors but negates the 
inference of effects upon attitudes and 
is equivocal regarding career plans. 
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METHOD 
Subjects and Data? 


Two groups of near-winners—5,126 stu- 
dents who received Certificates of Merit 
and 2,848 students who merely received 
letters of commendation—answered a ques- 
tionnaire approximately 6 months after 
the announcement of awards in the second 
National Merit Scholarship program. The 
C of M group received greater public recog- 
nition: their names were published in a 
booklet distributed to colleges, universities, 
and other scholarship granting agencies and 
they received approximately two and one- 
half times more newspaper coverage than 
commended students. The decision to 
award some students the Certificate of 
Merit, which meant greater public recog- 
nition, was made chiefly on the basis of 
“qualifying scores’’ on the CEEB Scholar- 
ship Qualifying Test (SQT). A second apti- 
tude test, the Scholastic Aptitude Test, 
was used to confirm the high ability of all 
finalists, i.e., all students scoring above 
the SQT qualifying score for the state in 
which the student attended high school.’ 
Two hundred and forty-one students who 
voluntarily withdrew from the program 
before the second test or whose scores were 
not confirmed received neither award while 
7,255 students who satisfactorily completed 
the second test received Certificates of 
Merit. The latter were subsequently 
screened by a selection committee and 827 
of these students were awarded Merit 
Scholarships. Since the interest is in esti- 
mating the effects of honorary awards, 
questionnaire responses from Merit Schol- 
ars are not included in these analyses. As 
Table 1 shows, response rate did not vary 
systematically by test score interval, and 


2 Details of the sample of students, the 
experimental treatment, and dependent 
variables are described in the previous re- 
port (Thistlethwaite, 1959), and only the 
essential features of the data collection will 
be discussed here. 

3 Recognition awards in the 1957 Merit 
program were distributed so that the num- 
ber of students recognized in each state 
was proportional to the number of public 
high school graduates in each state. Since 
there were marked state differences in 


student performance on this test, qualify- 
ing scores varied from state to state. All 
SQT scores represented a composite in 
which verbal scores were weighted twice as 
heavily as mathematical scores. 
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there is no reason to believe that differen- 
tial response bias can account for the effects 
to be described. 


Regression-Discontinuity Analysis 


In situations such as the foregoing, 
where exposure to an experimental 
treatment (in this case, increased 
public recognition) is determined by 
the subject’s standing on a single, 
measured variable, and where the 
expected effects of the treatment are of 
much the same nature as would be 
produced by increasing magnitudes of 
that variable, examination of the de- 
tails of the regression may be used to 
assess experimental effects. The experi- 
mental treatment should provide an 
additional elevation to the regression 
of dependent variables on the exposure 
determiner, providing a steplike dis- 
continuity at the cutting score. 

The argument—and the limitations 
on generality of the result—can be 
made more specific by considering a 
“true” experiment for which the 
regression-discontinuity analysis may 
be regarded as a substitute. It would 
be both indefensible and infeasible to 
conduct an experiment in which a 
random group of students along the 
whole range of abilities would be 
given the C of M award while a 
randomly equivalent group received 
merely the letter of commendation. 
However, a group of commended 
students who narrowly missed receiving 
the higher award might be given the 
opportunity of receiving extra recog- 
nition. Thus students in Interval 10 in 
Figure 1 might be randomly assigned to 
the different treatments of C of M 
award and no C of M award. The two 
half-circle points at 10 for Line AA’ 
in Figure 1 illustrate a possible out- 
come for such a true experiment, the 
solid half-circle representing the award 
group, and the hollow half-circle the 
no award group. Alternatively, a simi- 
lar true experiment might be carried 
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TABLE 1 
PARTICIPANTS IN 1957 Merit ProGram CLassiIFIED BY APTITUDE Score INTERVAL 


| [Percentage of 


Number Percentage of © of M win- 
Scholrhip of Ment | Sumberig | Number of | 
— interval® excluded sample (4) responding — 
(a) (2) (3) (5) Scholarships 
(6) 
Commended stu- Below 1 419 322 76.8 
dents 1 318 256 80.5 
2 368 281 76.4 
3 320 258 80.6 
4 407 338 83.1 
5 324 259 79.9 
6 333 267 80.2 
7 280 213 76.1 
8 301 248 82.4 
9 256 201 78.5 
10 262 205 78.2 
Totals 3,588 2,848 79.4 
Certificate of 11 17 476 380 79.8 3.4 
Merit winners 12 22 466 370 79.4 4.5 
13 16 399 319 79.9 3.9 
14 17 371 298 80.3 4.4 
15 19 361 300 83.1 5.0 
16 34 358 289 80.7 8.7 
17 13 319 247 77.4 3.9 
18 18 345 256 74.2 5.0 
19 17 254 211 83.1 6.3 
20 23 301 237 78.7 7.1 
Above 20 631 2,778 2,219 79.9 18.5 
Totals 827 6,428 5,126 79.7 11.4 


* Intervals show the student’s SQT score relative to the qualifying score in the student’s state, e.g., subjects whose 
scores equaled the qualifying score are classified in Interval 11, those whose scores were one unit less than the qual- 


ifying score are classified in Interval 10, ete. 


> The designated sample for commended students consisted of a 47% random sample of all commended students. 


out among students just above the 
cutting point (Score 11 in Figure 1). 
For reasons discussed below, the re- 
gression-discontinuity analysis 
tempts to simulate the latter of these 
two experiments, by extrapolating 
from the below-cutting-point line to an 
“untreated” Point 11 value (an in- 
ferred substitute for the no award “con- 
trol group’). Thus the major evidence 
of effect must be a distinct discon- 
tinuity or difference in intercept at the 
cutting point. Outcomes such as those 


shown in Line AA’ would, of course, be 
strictly demonstrated only for aptitude 
intervals adjacent to the cutting point, 
and inferences as to effects of the C of 
M award upon persons of other ability 
levels would be made in hazard of un- 
explored interactions of award and 
ability level. Inferences as to what the 
regression line would have looked like 
without the C of M award become 
more and more suspect the further the 
no award experience of Points 1 to 10 
has to be extrapolated. The extrapola- 
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ATTRIBUTE 


‘ 


PERCENT OF STUDENTS EXHIBITING GIVEN 
a) 
‘ 


(CERTIFICATE OF MERIT WINMERS) 
4 


789 208384 1S 6 7 


TEST SCORES OF STUDENTS 
IN ARBITRARY UNITS 

Fic. 1. Hypothetical outcomes of a re- 
gression-discontinuity analysis. 
tion is best for Point 11 and becomes 
increasingly implausible for Points 12 
through 20. 


G-G' -- PERCENT WINNING SCHOLARSHIPS 
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PERCENT OF STUDENTS WINNING SCHOLARSHIPS 
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To better illustrate the argument 
several hypothetical outcomes are 
shown in Figure 1. Line AA’ indicates 
a hypothetical regression of the per- 
centage exhibiting Attribute A as a 
function of score on the decision vari- 
able. The steplike discontinuity which 
begins at the point where the experi- 
mental treatment begins to operate 
would be convincing evidence that the 
certificate has had an effect upon At- 
tribute A. Similarly, outcomes such as 
those shown by Lines BB’ and CC’ 
would indicate genuine treatment 
effects. Line DD’ is a pure case of no 
effect. Lines EE’ and FF’ are trouble- 
some: there seems to be a definite 
change in the regression lines, but the 
steplike discontinuity at the cutting 
point is lacking. Consequently the 
points could merely represent contin- 
uous, curvilinear regressions. It seems 
best not to interpret such ambiguous 
outcomes as evidence of effects. 

In applying this mode of analysis to 
the present data, the qualifying score 
in each state was used as a fixed point 
of reference, and students were classi- 


— PERCENT WINING SCHOLARSHIPS OF $150 OR MORE 


(CERTIFICATE OF MERIT WINNERS) 


APTITUDE 


TEST SCORES OF STUDENTS 
Fig. 2. Regression of success in winning scholarships on exposure determiner. 
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Fic. 3. Regression of study and career plans on exposure determiner. 


fied according to the number of score 
intervals their SQT score fell above or 
below the qualifying score in their 
state. For example, in Figure 2 all 
students whose scores equaled the 
qualifying score in their state have 
been classified in Interval 11, while all 
those whose scores were one less than 
the relevant qualifying score have been 
classified in Interval 10. Data were 
analyzed only for subjects whose 
scores placed them within 10 score 
intervals of the relevant cutting point. 
Because of nonresponse to particular 
questionnaire items the Ns for per- 
centages and means in Figures 2-4 
differ slightly from those shown in 
Column 4 of Table 1. 


RESULTS 


Graphic Presentation of Results 


Figures 2, 3, and 4 present the re- 
sults for five variables, with least 
squares linear regression lines fitted to 
the points. In Figure 2, both regression 
lines for scholarships received seem to 
show a marked discontinuity at the 
cutting point. The persuasive appear- 


ance of effect is, however, weakened by 
the jaggedness of the regression lines 
at other points, particularly to the 
right of the cutting score. In addition, 
the slopes of the right-hand lines indi- 
cate that the effects are specific to 
students near the cutting score. The 
downward trend with high scores is 
presumably a result of eliminating from 
consideration those receiving Merit 
Scholarships. Where those of high 
aptitude test scores are passed over for 
National Merit Scholarships, it is 
usually for undistinguished high school 
grades, which likewise affect the schol- 
arship awards by other agencies as 
plotted in Figure 2. Table 1 shows 
that, in general, larger proportions of C 
of M winners in the highest score in- 
tervals were selected for Merit Scholar- 
ships. 

The two plots in Figure 3 show less 
discontinuity at the cutting point: 
there is little or no indication of effect. 
In II’ the difference between observed 
values at 10 and 11 is small, and while 
in the hypothesized direction, is ex- 
ceeded by five other ascending gaps. 
In JJ’ the observed 10-11 jump is 


nt 
re 
es 
a 
ri- 
ch 
ri- 
te ° 
he s 
t- f 
aS 
nt 
10 
e- 
te 
ne 
1g ij 
ne 
n- 
ns 
us 
to 
re 
at 


314 


2 


SCORE ON INTELLECTUALISM SCALE 
w 
& 


(COMMENDED STUDENTS ) 


sss 


DONALD L. THISTLETHWAITE AND DONALD T. CAMPBELL 


(CERTIFICATE OF MERIT WINNERS) 


12345 6789WN BY 


APTITUDE 


TEST SCORES OF STUDENTS 


IN ARBITRARY UNITS 


Fic. 4. Regression of attitudes toward intellectualism on exposure determiner. 


actually in the wrong direction. On the 
other hand, it is confirming of the 
hypothesis of effect that all of the 
observed Points 11 through 20 lie 
above the extrapolated line of best fit 
for Points 1 to 10, in both IT’ and JJ’. 
But this could well be explained by the 
rival hypothesis of an uninterrupted 
curvilinear regression from Points 1 to 
20. The picture is ambiguous enough 
to leave us skeptical as to the effects 
upon the student’s study and career 
plans. The analysis neither confirms 
nor denies the ex post facto findings. 

In Figure 4 no such ambiguity re- 
mains. It is inconceivable in view of 
this evidence that the Certificate of 
Merit award has increased favorable- 
ness of attitudes toward intellectual- 
ism, a finding clearly contradicting the 
ex post facto analysis. 


The Problem of Tests of Significance 


In discussing tests of significance in 
this case, it is probably as important to 
indicate which tests of significance are 
ruled out as to indicate those which 
seem appropriate. Again, reference to 
the pure cases of Figure 1 will be help- 


ful. A simple ¢ test between Points 10 
and 11 is excluded, because it would 
show significance in an instance like 
DD’ if the overall slope were great 
enough. That is, such a test ignores the 
general regression obtained independ- 
ently of the experimental treatment. 
Such a test between adjacent points is 
likewise ruled out on the consideration 
that even if significant in itself, it is 
uninterpretable if a part of a very 
jagged line in which jumps of equal 
significance occur at numerous other 
places where not expected. Similarly, a 
t test of the difference between the 
means of all points on each side of the 
cutting point would give significance 
to cases such as DD’ or EE’, which 
would be judged irrelevant. Further- 
more, covariance tests applied to the 
regression lines (e.g., Walker & Lev, 
1953, pp. 390-395) are judged inap- 
propriate, because of the differential 
sample bias for the score intervals 
arising from the exclusion of Merit 
Scholars. Even in the ideal case, if the 
hypothesis of common slope is rejected 
(as it would be for lines such as EE’ 
and FF’) we presumably could not 
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proceed further with a simple linear 
version of the covariance model. 

Mood (1950, pp. 297-298) provides 
a ¢ test appropriate for testing the 
significance of the deviation of the first 
experimental value beyond the cutting 
point (i.e., the observed Point 11) from 
a value predicted from a linear fit of the 
control values (i.e., the encircled point 
in Figures 2, 3, and 4, extrapolated 
from Point 1 through 10). As applied 
here, each plotted point has been 
treated as a single observation. On this 
basis, both of the plots in Figure 2 
show a significant effect at Point 11. 
For GG’, p < .025; for HH’, p < .01 
(one-tailed tests). Thus the Certificate 
of Merit seems to have significantly 
increased chances of obtaining scholar- 
ships from other sources. For none of 
the other figures does this test ap- 
proach significance. 

The test in this form fails to make 
use of the potentially greater stability 
made available by considering the 
trend of all of the Values 11 through 
20. Potentially the logic of the Mood 
test could be extended to provide an 
error term for the difference between 
two extrapolated points at 10.5, one 
extrapolated from Points 1 through 10, 
the other from Points 11 through 20. 
In many applications of the regression- 
discontinuity analysis, this would be 
the most appropriate and most power- 
ful test. In our present instance, we 
have judged it inappropriate because 
of the differential sampling bias felt to 
exist in the range of Points 11-20, as 
explained above. 


Discussion 


A critic may easily question the 
results of an ex post facto experiment 
by supposing that one or more relevant 
matching variables has been inade- 
quately controlled or entirely over- 
looked. In contrast the regression- 
discontinuity analysis does not rely 


upon matching to equate experimental 
and control groups, hence it avoids the 
difficulties of (a) differential regression- 
toward-the-mean effects, and (b) in- 
complete matching due to failure to 
identify and include all relevant ante- 
cedent characteristics in the matching 
process. 

Edwards (1954, pp. 279-282) has 
shown how pseudo effects may be 
produced in ex post facto designs 
through differential regression effects. 
Suppose, for example, we were to 
match, with respect to aptitude test 
scores, a group exposed to recognition 
and a group not exposed to recognition. 
Since exposure to recognition tends to 
be positively correlated with aptitude 
test score we expect that the matched 
experimental subjects will have low 
aptitude scores relative to other ex- 
posed subjects, while the matched con- 
trol subjects will have high aptitude 
scores relative to other unexposed 
subjects. To the extent that there are 
errors of measurement on the aptitude 
variable, however, our experimental 
group is apt to contain subjects whose 
aptitude scores are too low through 
error, while our control group is apt to 
contain subjects whose aptitude scores 
are too high through error. Simply on 
the basis of regression effects, then, we 
can predict that the matched experi- 
mental group will excel the matched 
control group on a subsequent admin- 
istration of the aptitude test and on 
any other variable positively correlated 
with aptitude. Following Thorndike 
(1942, pp. 100-101), who discussed a 
similar problem, one might attempt to 
match individuals on the basis of 
predicted true score on the background 
trait—i.e., score predicted by the 
regression equation between original 
test and a retest at the time of the 
experimental comparison. However, 
the predicted true score for each indi- 
vidual must be determined from the 
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regression equation for his own popula- 
tion, and for groups when the special 
treatment is not applied. Unfortu- 
nately such matching is usually impos- 
sible in situations where we wish to use 
the ex post facto design, since we 
typically cannot obtain pretest and 
posttest measures on control variables 
for “experimental” groups from which 
the special treatment has been with- 
held. Indeed if we had the power to 
withhold the treatment from some 
subjects we would usually be able to 
test our causal hypotheses by an 
experiment with true randomization. 
In short, the suggested procedure for 
controlling regression effects in ex post 
facto studies presupposes knowledge 
which we typically cannot obtain. 

In the present analysis exposed and 
unexposed groups are subdivided ac- 
cording to their closeness to receiving 
a treatment other than the one they 
have received. Background traits cor- 
related with the probability of exposure 
to recognition (e.g., rank in high school 
graduating class, scholastic aptitude, 
etc.) presumably vary systematically 
with the score intervals which repre- 
sent the student’s nearness to the cut- 
ting point. All of these traits contribute 
to the observed slopes of the regression 
lines plotted in Figures 2—4. Since there 
is no reason to believe that the compos- 
ite effect of all relevant background 
traits fluctuates markedly at the 
cutting point, regression discontinuities 
emerging at the 10-11 gap must be 
attributable to the special experimental 
treatment—the only factor which 
assumes an abrupt change in value in 
this region. Thus the new analysis 
seems to provide a persuasive test of 
the presence or absence of experimental 
effects.* 


‘Background traits uncorrelated with 
the probability of exposure to recognition 
will, of course, not vary systematically 
with score intervals, but these traits are 


x 


DONALD L. THISTLETHWAITE AND DONALD T. CAMPBELL 


The value of the regression-discon- 
tinuity analysis illustrated here is that 
it provides a more stringent test of 
causal hypotheses than is provided by 
the ex post facto design. Admittedly 
the class of situations to which it is 
applicable is limited. This class con- 
sists of those situations in which the 
regression of dependent variables on a 
single determiner of exposure to an 
experimental treatment can be plotted. 
Whenever the determiners of exposure 
are multiple or unknown this mode of 
analysis is not feasible. 

Of the five variables described in 
Figures 2-4 the  regression-discon- 
tinuity analysis indicated significant 
effects only for those shown in Figure 
2. The ex post facto experiment, on the 
other hand, indicated significant effects 
for all variables except HH’ (success 
in winning a freshman scholarship of 
$150 or more). For six other variables, 
not reported here, neither analysis 
indicated a significant effect.6 Con- 
sidering the regression-discontinuity 
analysis to be the more definitive, it 
appears that the ex post facto experi- 
ment underestimated effects for one 
variable and wrongly indicated effects 
for three variables. 

We conclude that increased public 
recognition tends to increase the stu- 
dent’s chances of winning scholarships. 
There is no clear-cut evidence in the 
present analysis that such recognition 
affects the student’s career plans, al- 


irrelevant. Even if partialed out they would 
not affect the correlation between the de- 
pendent variable and degree of exposure to 
recognition. 

5 No significant differences were found 
with respect to the percentages enrolling 
in college immediately, well satisfied with 
their choice of college, believing their col- 
lege offers the best training in their field of 
study, going to college more than 250 miles 
from home, applying for two or more schol- 
arships, or receiving encouragement from 
their high school teachers and guidance 
counselors to go to college. 
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though an effect upon plans to seek 
graduate or professional degrees is not 
ruled out. In this regard, Thistleth- 
waite (in press) has reported that when 
near-winners in a subsequent National 
Merit program were asked, “‘How did 
winning a C of M help you?” approxi- 
mately two out of every five reported 
that it “increased my desire for ad- 
vanced training (MA, PhD, MD, etc.).” 
In short, while other evidence indicates 
that the hypothesis of effect upon 
study plans may be correct, the present 
analysis does not provide confirmation. 


SUMMARY 


The present report presents and 
illustrates a method of testing causal 
hypotheses, called regression-discon- 
tinuity analysis, in situations where 
the investigator is unable to randomly 
assign subjects to experimental and 
control groups. It compares the results 
obtained by the new mode of analysis 
with those obtained when an ex post 
facto design was applied to the same 
data. The new analysis suggested that 
public recognition for achievement on 
college aptitude tests tends to increase 


the likelihood that the recipient will 
receive a scholarship but did not sup- 
port the inference that recognition 
affects the student’s attitudes and 
career plans. 
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The theory has been put forward 
by Peel (1956) that the first stages 
of school learning take place by the 
process of instrumental conditioning. 
The types of task envisaged by Peel 
as falling within this framework in- 
clude such things as simple addition 
and reading, in which the learning 
process can be regarded as entailing 
the formation of stimulus-response 
connections. For example, the child 
has to learn that the stimulus of the 
letters C-A-T should elicit the vocal 
response “cat.” The purpose of this 
paper is partly to suggest that this 
theory explains certain established 
findings in educational psychology 
and partly to report a test of one de- 
duction from the theory. 

It is evident that if the theory is 
correct, children who form condi- 
tioned responses readily will, other 
things being equal, be better at arith- 
metic and reading than those who 
condition slowly. One way of testing 
and extending this theory is through 
the application of the theory of per- 
sonality advanced by H. J. Eysenck 
(e.g., 1957). Briefly, this theory has 
established by factorial methods three 
independent dimensions of person- 
ality, namely, neuroticism, introver- 
sion-extraversion, and psychoticism, 
all of which are independent of intel- 
ligence. It has been postulated by 
Eysenck that the dimension of intro- 
version-extraversion corresponds to 
Hull’s construct of reactive inhibition, 
in that it is assumed that extraverts 
generate reactive inhibition quickly 
and dissipate it slowly (Eysenck, 
1957). It follows from Hull’s theory 
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that reactive inhibition acts against 
the building up of reaction potential 
and that extraverts should form con- 
ditioned responses slowly, and this de- 
duction has been directly confirmed 
by Franks (1957). The application 
of Eysenck’s theory to the question 
of educational attainment is simply 
this: if the theory that acquiring edu- 
cational skills is a matter of condi- 
tioning is correct, then those who 
learn these skills readily should tend 
to be introverted and to generate re- 
active inhibition slowly. There are al- 
ready available some data to support 
the first of these deductions. To par- 
ticularize: 

1. A positive association between 
introversion and educational attain- 
ment at the university student level 
has been reported in England by 
Furneaux (1956), Broadbent (1958), 
and Lynn (1959). Although American 
studies do not use this theoretical 
framework, it seems likely that such 
findings as that of Duff and Siegel 
(1960) that overachievers tend to be 
unsociable reflect the same associa- 
tion. 

2. Introverts tend to have good vo- 
eabularies, both in relation to their 
intelligence (Himmelweit, 1945) and 
absolutely (Lynn & Gordon, in press). 

3. There are no well-developed in- 
struments for measuring introversion- 
extraversion in children and conse- 
quently less is known of the relation 
of this personality dimension to at- 
tainment in this area. However, it is 
reasonably well-established that de- 
linquents, who display extraverted be- 
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havior patterns, tend to be education- 
ally retarded (e.g., McCarthy, 1954). 

4. Introverts tend to have lepto- 
morphic body build (i.e., to be thin in 
relation to their height) and lepto- 
morphic children tend to be good 
readers (Eysenck, 1959a). 

5. Women tend to be more intro- 
verted than men (Eysenck, 1959b). 
Girls tend to do better than boys in 
the national examination which is 
taken in England by children at the 
age of 11 (Yates & Pidgeon, 1957), 
and in America Terman and Tyler 
(1954) conclude that “school marks 
almost universally indicate superior 
achievement for girls.” Better 
achievement in relation to intelligence 
by women is reported at the university 
level in America by Duff and Siegel 
(1960). 

6. Brain injury tends to make 
people more extraverted (Eysenck, 
1957), and brain injured children tend 
to have poor educational attainments 
in relation to their intelligence 
(Stephen, 1958). 

It is perhaps reasonable to conclude 
that a consideration of these studies 
suggests that Eysenck’s system pulls 
together a number of discrete findings 
and that its possibilities in educa- 
tional psychology deserve further 
scrutiny. It is not suggested that the 
studies referred to above do more than 
give tenuous support to the applica- 
tion of Eysenck’s theory in this field. 
It appears that one of the chief weak- 
nesses of the application as it now 
stands is that the associations outlined 
are between variables which are some- 
what far removed from the individual 
differences in conditioning and the 
generation of reactive inhibition that 
are assumed to underlie them. Hence 
although the observed correlations 
ean be derived from the theory, it 
nevertheless remains true that other 
plausible explanations for the obser- 
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vations could be put forward. It is 
evident that the theory needs more 
rigorous testing of its fundamental 
postulates. Accordingly, one of the 
key assumptions has been subjected 
to experimental test, namely, that in- 
dividual differences in the generation 
of reactive inhibition are responsible 
for differences in educational attain- 
ment. If this is so, the two should cor- 
relate. A test of this prediction is re- 
ported below. 


INVESTIGATION 


Subjects. The subjects (Ss) were 82 chil- 
dren with an age range of 8-11, comprising 
the entire population in this age range of 
two small schools. There were 36 boys and 
46 girls. 

Procedure. The following tests were given 
to the children: 

1. Reading: Schonell’s Graded Reading 
Vocabulary Test. To control the effect of 
age, a reading score was obtained by sub- 
tracting the chronological age from the 
reading age. 

2. Generation of reactive inhibition: (a) 
A reminiscence test involving inverted num- 
ber printing. This is an individual test in 
which S is instructed to print numbers 0-9 
upside down as fast as he can over 12 trials; 
he is then given a 2-minute rest before a 
final trial. There is of course a gain in speed 
after the rest interval, this gain being the 
phenomenon of reminiscence, and a reminis- 
cence score for each S was obtained by sub- 
tracting the time on the postrest trial from 
the mean of the four last prerest trials. It 
is assumed that this score reflects the amount 
of reactive inhibition generated, the argu- 
ment being that reactive inhibition is gen- 
erated during the task and that the amount 
of gain following rest reflects the amount of 
reactive inhibition that has dissipated. For 
detailed discussion of reminiscence and the 
part played by reactive inhibition in tasks 
of this type, the reader is referred to the 
original experimental reports (Kimble, 1949). 

(b) A vigilance task: this is essentially a 
task of maintaining attention and it is as- 
sumed that attention fails after an interval 
of time as a result of reactive inhibition ac- 
cumulating. The task used in this investiga- 
tion consisted of listening to a continuous 
stream of letters delivered from a tape re- 
corder at the rate of three every 2 seconds; 
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TABLE 1 


Propuct-MoMENT CORRELATIONS BETWEEN 
Tests oF REacTIvVE INHIBITION AND 
READING AND INTELLIGENCE 


Reading Score Accuracy Speed 
Vigilance + .33* + .08 — .03 
Reminiscence —.23* —.12 + .09 


* Significant at the .05 level. 


the signal to be watched for was the occur- 
rence of the same three letters consecutively 
and when this occurred the Ss were in- 
structed to write down the letter. There were 
40 such signals spread over 20 minutes and 
the score used was the number correctly re- 
corded from the last 20 (all Ss tend to get 
the first signals correct). All Ss noticed the 
first signal and it is evident therefore that 
all Ss understood the instructions and that 
failure to notice some signals can be at- 
tributed to inattention rather than lack of 
intelligence. Evidence that tests of this sort 
measure extraversion (and hence individual 
differences in the generation of reactive in- 
hibition) is presented by Broadbent (1958) 
and Eysenck (1959d). 

3. Intelligence: The Nufferno test was 
given. This test gives separate scores for 
speed and accuracy and was given because of 
the desirability of avoiding an intelligence 
test of the verbal type, which is likely to be 
affected by the introversion-extraversion 
factor. The method of scoring speed re- 
stricted the use of the speed measure to 48 
Ss. 


RESULTS AND DISCUSSION 


The product-moment correlations 
of the variables measured are shown 
in Table 1. Although the correlations 
are low, the two measures of reactive 
inhibition are significantly associated 
with reading attainment, and the find- 
ings therefore give some measure of 
support to the hypothesis that a tend- 
ency to generate reactive inhibition 
quickly is detrimental to the acquisi- 
tion of educational skills. The finding 
that speed and accuracy on the intelli- 
gence test are not related to the meas- 
ures of reactive inhibition is probably 
due to the fact that the test used was 
too short for appreciable quantities of 
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reactive inhibition to accumulate. It 
has been shown by Eysenck (1959c) 
that it is only towards the end of a 
lengthy intelligence test that the per- 
formance of extraverts begins to de- 
teriorate. 

Although this application of Ey- 
senck’s theory to educational psychol- 
ogy is strengthened by the present 
findings, it is evident that further re- 
search is needed to put the theory on 
a firm foundation. One of the merits 
of the theory is that it generates a 
large number of predictions that are 
susceptible to experimental test. One 
of the most obvious is that there 
should be a correlation between con- 
ditionability and educational attain- 
ment. Another concerns the action of 
drugs. It is argued that stimulant 
drugs shift those who take them to- 
wards the introverted end of the intro- 
version-extraversion dimension (Ey- 
senck, 1957). This is supported by the 
findings that people condition more 
readily under the influence of stimu- 
lant drugs (Hilgard & Marquis, 
1940). It can therefore be predicted 
that children under the influence of 
stimulants should learn more readily 
the educational skills that are as- 
sumed to be acquired through condi- 
tioning. It is possible that this predic- 
tion, if verified and worked out in 
detail, would have useful practical re- 
sults for the treatment of education- 
ally retarded children. 


SUMMARY 


The theory that learning simple ed- 
ucational skills takes place by condi- 
tioning, taken together with Eysenck’s 
personality theory, yields the predic- 
tion that those who do well in educa- 
tional tasks should be introverted 
and generate reactive inhibition 
slowly. It is suggested that this theory 
brings into order a number of findings 
in educational psychology. 
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A further prediction was made from 
the theory, namely, that good achiev- 
ers should show low indices of reactive 
inhibition as assessed by reminiscence 
and vigilance tasks. An investigation 
of the relation of performance on these 
tasks to reading attainment in chil- 
dren aged 8-11 tended to confirm the 
prediction. 
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Teachers have long been concerned 
with quizzes’ contribution to learning, 
and many have believed that the use 
of quizzes produces greater student 
achievement. Several early studies 
(Gable, 1936; Hertzberg, Heilman, & 
Leuenberger, 1932; Jones, 1923) indi- 
cate that pupils who are tested periodi- 
cally make somewhat higher scores on 
final examinations than students who 
are not tested periodically prior to the 
final examination. Little research deal- 
ing with this problem has been re- 
ported in recent years. 

Although it seems to be widely ac- 
cepted that quizzes increase learning, 
the reason for increased learning is 
not always clear. One investigator has 
attributed the increased achievement 
to such factors as knowledge of prog- 
ress (Deputy, 1929). Others (Curtis & 
Woods, 1929) have speculated that 
the manner in which the results of 
quizzes or examinations are treated 
may influence learning. 

Perhaps the most obvious explana- 
tion of increased achievement follow- 
ing the use of quizzes would be that 
quizzes provide extrinsic motivation, 
i.e., students will work harder through- 
out the course, because they want to 
get good grades on the quizzes, and 
this yields higher achievement. 

An alternative explanation of in- 
creased learning following the use of 
quizzes might be that the knowledge 
of results of performance on quizzes 
provides the students with a greater 
opportunity to see their areas of 
strength and weakness in the subject 
matter. Students work toward elimi- 
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nating areas of weakness, thus ob- 
taining greater achievement. 

Another possibility might be that 
greater learning following the use of 
quizzes is due neither to extrinsic mo- 
tivation nor knowledge of results but 
merely to the fact that frequent 
quizzes structure a course. For ex- 
ample, if an instructor gives quizzes 
to his students, he is in a very real 
sense telling the students: “These 
are the facts and principles that I be- 
lieve are important; remember them!” 

A fourth reason that might be of- 
fered to explain increased achieve- 
ment associated with the use of quizzes 
would be that quizzes may affect 
learning simply through the enforced 
activity with respect to subject matter 
during the quiz itself. 

It is also possible, of course, that 
the use of quizzes may produce differ- 
ing effects under various combinations 
of teaching method and subject mat- 
ter. 

The present study was designed to 
test the principal hypothesis—that 
the use of quizzes is associated with 
increased learning of subject matter— 
and three conditional hypotheses— 
that, if such an increase exists, it is 
associated with the enforced activity 
with subject matter provided by 
quizzes which includes (a) structur- 
ing the course, (b) this in combination 
with knowledge of results of per- 
formance, and (c) these in combina- 
tion with the extrinsic motivation pro- 
vided by quizzes. 


METHOD 


Subjects. Subjects of the experiment were 
104 undergraduate students in four sections 
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of an introductory educational psychology 
course at Indiana University. Subjects’ as- 
signment to sections was determined by con- 
ventional means, i.e., choice of the individual 
student within scheduling limitations, but 
they were not significantly different with re- 
spect to age, sex, intelligence (ACE), or class 
standing (chi square and analysis of var- 
iance). 

Procedure. Four distinct variations of quiz 
procedure were employed in the four sec- 
tions of introductory educational psychology. 
The four sections were taught by the same in- 
structor, thereby eliminating one important 
source of extraneous variation. The varia- 
tions in quiz procedure were as follows: 

Section A was given weekly written quizzes 
consisting of 20 true-false items. The quiz 
papers were graded by the instructor, grades 
were recorded, and the papers were returned 
to the students at the next class period. This 
was judged to be a quiz situation that would 
produce enforced activity with subject mat- 
ter and a combination of extrinsic motiva- 
tion, knowledge of results, and structuring. 

Section B was given the same weekly writ- 
ten quizzes. The students, however, checked 
their own papers. The papers were kept by 
the individual student and were not seen by 
the instructor. This was judged to be a quiz 
situation that would produce enforced ac- 
tivity with subject matter and a combination 
of knowledge of results, and structuring. 

In Section C, the same weekly quiz ma- 
terial was used, but the instructor merely 
read the questions aloud, answering the items 
himself. This was judged to be a situation 
that would produce enforced activity with 
subject matter (required to listen to quiz- 
like information) and structuring. 

Section D was given no quizzes. Course 
content, upon which the quizzes for other 
sections were based, was covered only in con- 
nection with regular classroom lectures and 
discussion. This was judged to be a control 
situation without the extrinsic motivation, 
knowledge of results, structuring, or enforced 
activity with subject matter provided by the 
various quiz conditions. 

In all, 13 quizzes were administered to 
Sections A, B, and C. Six quizzes were admin- 
istered prior to the midsemester examination, 
and seven were administered after the mid- 
semester examination. 

During the first week of the semester, a 
pretest of 100 multiple-choice type items was 
administered to subjects in all four sections. 
The pretest was based upon an item analysis 
of tests used in the same course in previous 
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TABLE 1 


ANALYSIS OF COVARIANCE FOR MIDSEMES- 
TER AND FINAL EXAMINATION SCORES 
FoLiowinG Four Quiz PRocepURES 

IN AN EpvucATIONAL PSYCHOLOGY 


Adjusted 
Source F 
dj MS 
Midsemester Ex- 
amination 
Between Groups | 3 | 324.107) 2.62* 
Within Groups 99 | 123.781 
Final Examination 
Between Groups | 3 | 532.755) 1.94** 
Within Groups —§ 99 | 275.000 
*p< .06. 
**p > .10, 


years. At midsemester, a different 100-item 
multiple-choice type examination was ad- 
ministered to all subjects. Fifty of the test 
items were common to the pretest; 50 items 
were new. At the end of the semester, a 150- 
item multiple-choice type examination was 
given to all subjcts. The test items included 
the other 50 items of the pretest, the 50 
new items of the midsemester examination, 
and 50 new items. All examination questions 
were based upon content covered by the 
textbook or class lecture and discussion. 
Both midsemester examination and the 
final examination were used as criteria to 
assess the effects of the four variations of 
quiz procedure. Analysis of covariance was 
employed to treat the data. In each analy- 
sis, the pretest multiple-choice examination 
served as the control variable, thereby sta- 
tistically equating the four groups’ prior 
knowledge of educational psychology. 


RESULTS 


Looking beyond the conventional 
.05 level of confidence, there was a 
significant difference between quiz 
procedure groups’ learning of subject 
matter, when the midsemester exami- 
nation served as the criterion. The F 
value obtained by covariance analysis 
(F309 = 2.62) was significant at the 
.06 level of confidence, as shown in 
Table 1. Adjusted midsemester ex- 
amination raw score means for each 
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TABLE 2 


PrRETEST, MIDSEMESTER, AND FINAL Exam- 
INATION MEANS FOR Four SECTIONS OF 
AN EpvucaTIONAL PsycHoLoGy CLAss 
SuBJECTED TO DIFFERENT WEEKLY 
Quiz PROCEDURES 


Means* 
Section n | Mid- : 
- Final 
test “ter Exam 
A. Teacher-Graded 3033.90.66 .23* 101.70 
Quiz 
B. Student-Checked /28/34.8663.86 | 94.39 
Quiz | 
C. Quiz-Type Infor- |14/36.50'59.73 | 97.50 
mation 
D. No Quiz 32/33 .88 59.36 92.41 


® Midsemester means adjusted by analysis of covari- 
ance; final examination means not adjusted or tested for 
significance of difference, because the F test was not 


significant. 

* Significantly greater than mean of Section D, ¢t = 
2.35, p < .05; for all other differences between means, 
p> .05. 


section are presented in Table 2. The 
only difference between means (ob- 
tained through ¢ tests) that was sig- 
nificant beyond the .05 level of con- 
fidence was that for Sections A 
(teacher-graded quiz) and D (no 


quiz). 
When the final examination served 
as the criterion, there was no 


significant difference between quiz pro- 
cedure groups’ learning of subject mat- 
ter. The F value obtained by covari- 
ance analysis (F399 = 1.94) was not 
significant at the .10 level of con- 
fidence. Unadjusted final examination 
means for each group are also shown 
in Table 2. 


Discussion 


The experimental design, with more 
than one independent variable com- 
mon to more than one procedure, does 
not permit the conclusion that any one 
variable in and of itself is related 
to increased learning of subject mat- 
ter. But it is in this kind of combina- 
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tion that variables operate in the 
classroom. And the design does permit 
conclusions to the effect that the in- 
dependent variables in their various 
combinations are or are not associated 
with increased learning of subject mat- 
ter. 

Also, the presence of previously used 
(review) items in the midsemester and 
final examination may have had an 
influence upon the results of the ex- 
periment, but this, again, was taken 
to be the typical classroom examina- 
tion procedure. 

It might be noted, too, that final 
examination results, though not re- 
liably different, were in the same gen- 
eral direction as midsemester per- 
formance, and it is possible that the 
pretest examination plus the midterm 
examination produced sufficient ex- 
trinsic motivation, knowledge of re- 
sults, structure, and enforced activity, 
with subject matter to overbalance, to 
a degree, the quiz effects. 

Results of the study, though, seem 
to indicate that a typical quiz pro- 
cedure used in a typical lecture-dis- 
cussion classroom situation was posi- 
tively related to subject matter 
achievement at midsemester but not 
at the end of the course. Further, it 
would appear that a combination of 
several possible dimensions of quizzes 
—enforced activity with subject mat- 
ter, structuring, knowledge of results, 
and extrinsic motivation—is necessary 
to elicit higher examination perform- 
ance, for only the quiz procedure con- 
taining all these variables had sig- 
nificantly higher achievement on the 
midsemester examination. 


CoNCLUSION 


It is concluded that the use of 
quizzes will tend to increase students’ 
achievement of subject matter early in 
a lecture-discussion type of course, 
that this is due to the combined in- 


= 


QUIZZES’ CONTRIBUTION TO LEARNING 


fluence of enforced activity with sub- 
ject matter, structuring of the course, 
knowledge of results of performance, 
and extrinsic motivation provided by 
quizzes, but that the significance of 
the increase in achievement is lost by 
the end of the course, possibly because 
of the overbalancing influence of these 
factors in the examinations them- 
selves. 
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This study is concerned with the 
relationship of certain variables to 
problem solving ability in seventh and 
eighth grade pupils. The variables in- 
clude problem recognition, word 
fluency, ideational fluency, closure, 
judgment, intelligence, and reading. 
Perhaps it should be emphasized that 
no claim is made for the appropriate- 
ness of the names given the variables; 
the variables can best be described in 
terms of the operations required by the 
respective tests. As far as the writers 
know, the importance of these vari- 
ables has not been previously investi- 
gated through multiple regression 
analysis of large samples. 


MeErTHOD 


Subjects 


The subjects (Ss) were 636 seventh and 
eighth grade pupils who had been tested 
extensively in an effort to determine differ- 
ences between good and poor problem 
solvers. All of the pupils attended junior 
high school in either Lansdowne-Aldan or 
Abington, school districts suburban to 
Philadelphia. 

The median chronological age of the 636 
Ss was 154 months with decile deviation 
from 143 to 165 months. The median IQ of 
the 615 available on the California Test of 
Mental Maturity, Long Form, was 114 with 
decile deviation from 97 to 128. Pupils who 
were absent for more than three tests were 
eliminated; pupils who had missed no more 
than three tests were assigned average 
scores of the tests missed. This had little 


1The research reported in this paper 
was part of a study of differences between 
good and poor problem solvers performed 
at the University of Pennsylvania pursuant 
to a contract with the Office of Education, 
United States Department of Health, Edu- 
cation, and Welfare. 
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effect on the test statistics, while making 
possible a sample of 513. 

The sample appeared to be representa- 
tive of large-city suburban seventh and 
eighth grade pupils. Actually, since the 
processes involved in problem solving ap- 
pear to be common, the sample is probably 
representative of other age and residence 
populations of similar IQ range. 


Problem Solving Criterion 


There are no tests of problem solving 
ability, as such; indeed it would be difficult 
to get agreement about what such tests 
should contain. Guilford (1954) believes 
that there are several kinds of problem 
solving and for each kind of problem there 
are usually several possible operations by 
which it can be solved. In his opinion, how- 
ever, general reasoning, fluency of ideas, 
flexibility, evaluation, and originality are 
among the abilities most likely to be im- 
portant in problem solving. According to 
Travers, Marion, and Post (1955), vari- 
ability -stereotypy, equivalence of reac- 
tions, reasoning, originality, learning set, 
and motivation are the relevant variables. 
Guetzkow (1951) suggests that set, fluency, 
and reasoning may account for a substan- 
tial part of problem solving behavior. 
MecNemar (1955) considers logical reason- 
ing, as identified by Guilford and his associ- 
ates, to be “‘an indispensable, if not a fun- 
damental, aspect of problem solving.’’ It 
seems reasonable to assume that reasoning 
is the central ability elicited by the kinds 
of problems useful in controlled experimen- 
tation in the classroom and laboratory. 

To obtain a criterion of problem solving 
ability, three standardized tests were used: 
the Differential Aptitude Tests of verbal 
and abstract reasoning and the Davis-Eells 
Games. A fourth test, Thought Problems, 
was constructed because of the need for a 
relatively difficult test containing reasoning 
problems of various kinds. About half of 
the problems were taken from Burt’s (1919) 
tests of reasoning; several of these have 
been used in various studies of reasoning 
and problem solving. 

The tests were administered during the 
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TABLE 1 
INTERCORRELATIONS, STANDARD DeEvia- 
TIONS, AND RELIABILITY COEFFICIENTS 

or CRITERION TESTS 


Test 2 | 3 s rn 
1. (VR + AR) 517.767 8.42) .905 
2. Davis-Eells |.474/4.25).648 
3; Thought Problems | (6.88 855 


last 2 weeks of October 1958, by teachers 
who had been selected and given instruc- 
tions beforehand. The details of the entire 
testing program were worked out by the 
principals and guidance counselors. The 
intercorrelations, standard deviations, and 
reliability coefficients of the separate tests 
are shown in Table 1. The reliability of the 
first test was estimated by the Kuder-Rich- 
ardson Formula 21; the reliabilities of the 
second and third, from half-test coefficients, 
stepped up. 

The three tests were weighted inversely 
as the standard errors of measurement. The 
obtained weights were approximately equal; 
hence, the composite or criterion measure 
of problem solving ability was the sum of 
scores on the three tests. The reliability 
coefficient of the composite, estimated by 
Kelley’s (1927) method, was about .93. The 
validity of the criterion is considered by 
Tate, Stanier, and Harootunian (1959). 


Predictor Variables 


Five of the seven predictor variables 
were compounded from various group 
tests administered to the Ss during 
November 1958, and January 1959; 
the remaining two variables, IQ and 
reading grade equivalent, were taken 
from the school records. 

The means, standard deviations, and 
reliability coefficients of the various 
tests representing each variable are 
shown in Table 2. Since the distribu- 
tions of several tests were markedly 
nonnormal, the scores on all tests were 
converted to normalized standard 


scores. Composite scores on each vari- 
able were obtained by averaging stand- 
ard scores on the tests included under 
that variable. 

All of the tests except those of clo- 


TABLE 2 


Means, STANDARD DEVIATIONS, AND RE- 


LIABILITY COEFFICIENTS OF THE PROBLEM 
SoLv1nG CRITERION AND THE TESTS FOR 
THE SEVEN PREDICTOR VARIABLES 
FoR 513 SuBJEcTs 
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| 3s 
Variable Mean | 32 
| 30 
n =x 
Problem Solving 82.50) 17.06 | .93 
Problem Recognition 
Seeing Problems 20.80} 8.93 | .85 
Missing Facts 18.72} 5.37 | .86 
Word Fluency 
First Letter 48.80) 13.53 | .86 
First and Last Let- | 13.78) 5.45 | .68 
ter 
Ideational Fluency 
Reasons 20.80) 8.66 .81 
Ideas 30.04) 10.60 | .85 
Groups of Things | 25.63) 8.31] .66 
Uses 18.82} 8.60 | .76 
Closure 
Incomplete Pictures| 16.58} 3.73 | .71 
Incomplete Words | 18.38} 5.78 | .88 
Concealed Figures | 45.13) 9.59 | .88 
Judgment 
Estimation 27.80) 5.59 | .37 
Best Answer 50.02) 10.51 | .77 
Critical Thinking | 33.97) 7.32| .66 
Intelligence Quotient |113.35) 12.32 | —e« 
Reading Grade Equiv-| 8.87; 1.35 | —* 
alent 


Note.—Ss having no scores on three or fewer tests 
were assigned mean scores on the missed tests. Actual 
frequencies on tests with less than 513 subjects were: 
Seeing Problems, 501; Missing Facts, 507; First Letter, 
504; First and Last Letter, 505; Reasons, 511; Ideas, 512; 
Groups of Things, 503; Uses, 506; Incomplete Words, 
512; Concealed Figures, 495; Estimation, 498; Best 
Answer, 482; and Critical Thinking, 464 

* All reliability coefficients, except Problem Solving, 
estimated from half-test scores in a 20% random sample 
of total group of about 600. Reliability coefficient of 
Problem Solving estimated by Kelley’s method. 

© Reliability coefficient not estimated but generally 
found to be satisfactory. 


sure and critical thinking were tried out 
and refined one or more times. The 
predictor variables and their respective 
tests are described briefly below. 
Problem Recognition. Various in- 
vestigators of problem solving make 
frequent reference to the recognition 
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of, formulation of, or orientation to 
the problem and usually regard it as a 
crucial part of the preparation process. 
Common sense, as well as research, 
supports the importance of the prob- 
lem recognition variable. One cannot 
seriously quarrel with the sense of the 
dictum, “a problem well stated is a 
problem half solved.” 

The “sensitivity to problems” and 
“conceptual foresight”’ factors identi- 
fied by Guilford, Kettner, and Chris- 
tensen (1955) would appear to be re- 
lated to problem recognition. The 
tests selected as measures of problem 
recognition were adapted from or sug- 
gested by Guilford’s studies. Descrip- 
tions of the tests and methods of 
scoring are as follows: 


Seeing Problems asked the S to write 
down problems which might come about in 
connection with a pencil and with a candle. 
Four minutes were allowed for each task. 
The score was the total number of sensible 
problems listed. 

Missing Facts consisted of 30 problems in 
arithmetic. Some of the problems could not 
be solved because necessary information 
was missing. The S was asked to list the 
fact or facts needed where necessary in- 
formation was lacking and to write nothing 
where all necessary information was given. 
Thirty-five minutes were allowed for the 
test. The score was the number of blanks 
correctly filled or left empty. 


Word Fluency. Word association or 
word fluency tests have been widely 
used in the experimental study of 
association. Beginning with Thur- 
stone’s (1938) study, word association 
tesis have been used in numerous fac- 
tor analyses of mental ability and have 
consistently yielded a factor currently 
identified as word fluency. 

Our word fluency tests were similar 
to some of those used by McNemar 
(1955) in her study of good and poor 
reasoners. It will be seen that the fol- 
lowing tests require the Ss to produce 
rapidly words fulfilling certain require- 
ments. 


First Letter required the S to write as 
many words as possible beginning with a 
given letter. There were two parts. In the 
first part, the S had to write words begin- 
ning with P; in the second, words beginning 
with B. Three minutes were allowed for 
each part. The score was the total number 
of recognizable words on the two parts 
minus duplications. 

First and Last Letters was similar to the 
test devised by Thurstone (1938). In the 
two sections of the test, the S was asked to 
write as many words as possible beginning 
with S and ending with L, and beginning 
with C and ending with T. Four minutes 
were allowed for each section. The score was 
the total number of correct words written. 


Ideational Fluency. Adkins and 
Lyerly (1952) and several other factor 
analysts have identified a factor de- 
scribed as ideational fluency. The fac- 
tor involves the facility to call up ideas 
when quantity rather than quality is 
emphasized. 

Ideational fluency is a relatively new 
construct, and there has been little 
study of its correlatives and predictive 
value. At present, the case for it in 
thinking is more logical than empirical. 
Certainly, it is logical to suppose that 
the person capable of rapidly producing 
ideas, other things being equal, will 
solve more problems than one less 
capable. 

Descriptions of the tests and meth- 
ods of scoring are given below. 


Reasons was similar to a test used by 
Adkins and Lyerly (1952). It consisted of 
two questions: ‘‘Why do people like to have 
trees in their back yards?’’ ‘‘Why do people 
want to live in cities?’’ The S was asked to 
write as many reasons as possible. The score 
was the number of different, sensible rea- 
sons given for both questions. Six minutes 
were allowed for each question. 

Ideas was similar to Topics used by 
Adkins and Lyerly (1952) which they credit 
to R. B. Cattell. The S was instructed to 
write as many ideas as possible about two 
topics: ‘‘A man going up a ladder”’ and “‘A 
man driving a truck down a street.’’ Five 
minutes were allowed for each topic. The 
score was the number of ideas listed. 

Groups of Things was derived from a test 
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used by Adkins and Lyerly (1952) which 
they credit to C. W. Taylor. The S was asked 
to list names of things that are round or 
could be called round and of things that are 
like a square or rectangle. Three minutes 
were allowed for each task. The score was 
the number of things listed which met spec- 
ifications. 

Uses was similar to a test used by Guil- 
ford, Kettner, & Christensen (1956). Two 
common objects, newspaper and brick, were 
given, and the S was asked to list as many 
uses for each as he could. Four minutes 
were allowed for each task. The score was 
the total number of uses listed. 


Closure. Thurstone (1944, 1949) in 
his study of perception and mechanical 
aptitude was apparently the first to 
identify the factors flexibility of closure 
and speed of closure. The ability to 
identify a figure in the midst of or in 
spite of perceptual distractions char- 
acterizes the former; the ability to 
unify an incomplete or discrete per- 
ceptual field into a single percept, the 
latter. Thurstone’s work showed that 
the closure factors were related to a 
reasoning factor, induction. 

Botzum (1951), Adkins and Lyerly 
(1952), and Pemberton (1952) identi- 
fied both factors in their analyses of 
reasoning tests. Whether or not the 
factors are related is not at present 
clear. There is no doubt, however, 
about relationship between closure and 
reasoning factors. As Johnson (1955) 
points out, this relationship adds some 
support to the arguments of Binet and 
the gestalt psychologists for a con- 
nection between perceptual reorganiza- 
tion and cognitive reorganization. 

The closure tests used in this study 
are briefly described below. 


Incomplete Pictures was Thurstone’s 
adaptation of the Street Gestalt Completion 
Test. The S was asked to identify an object 
or objects from the outline or parts given. 
The score was the number of objects cor- 
rectly identified. 

Incomplete Words was Thurstone’s Muti- 
lated Words test. The S was asked to iden- 
tify words from the outline or parts given. 
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Five minutes were allowed for the test. The 
score was the number of words correctly 
identified. 

Concealed Figures was Thurstone’s adap- 
tation of the Gottschaldt Figures Test and 
was made available through the courtesy of 
W. A. Botzum. The S was asked to identify 
simple designs hidden in more complex de- 
signs. The score was the number of correct 
identifications. 


Judgment. Although judgment would 
seem to be involved one way or another 
at every point in problem solving, it 
probably is true that it does not come 
fully into play until the problem solver 
has reached possible solutions. At any 
rate, it is convenient to treat it as the 
final process in problem solving. 
Johnson (1955, p. 282) summarizes the 
role of judgment in problem solving in 
the words, 


In general, judgment is a conclusive or 
decisive process, not a productive one, that 
brings a thoughtful episode to an end. In 
the case of highly differentiated human 
beings it is a retroflex process in that the 
thinker takes into account the motivational 
and instructional conditions that initiated 
the thoughtful episode. 


Guilford (1957) and his associates 
(1956) have attempted to explore the 
factorial composition of judgment or 
evaluation. Their studies suggest that 
among the abilities involved are 
practical estimation, practical judg- 
ment, and logical reasoning. 

To measure the judgment variable, 
Estimation and Best Answer, two 
relatively complex tests of judgment, 
were constructed. A third test, Critical 
Thinking, was especially devised for 
the study.? The tests and methods of 
scoring are described below. 


Estimation consisted of 30 multiple- 
choice items involving size or number. In 


2 We are indebted to Ethel Maw of Bryn 
Mawr College for constructing the test and 
also for helping in the scoring of the idea- 
tional fluency and similar nonobjective 
tests. 
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scoring, two points were given for the best 
answer to an item and one point for an 
answer adjacent to the best answer. The 
score was the total number of points. Thirty 
minutes were allowed for the test. 

Best Answer was suggested by Cardall’s 
(1950) Test of Practical Judgment. It con- 
tained 36 questions, each followed by 4 
possible answers. The S was asked to choose 
the best answer and the next-best answer 
for each question. In scoring, the answers 
were ranked from 1, best, to 4, worst, on a 
key. The differences between the S’s rank- 
ings and the key rankings were summed 
and subtracted from 100. Hence, the highest 
possible score was 100, and the lowest —44. 
Twenty-five minutes were allowed for the 
test. 

Critical Thinking consisted of a number 
of paragraphs containing information about 
topics in general science and social studies. 
Following each paragraph was a list of state- 
ments relating to the paragraphs. The S was 
asked to mark each statement as true, 
probably true, false, probably false, or in- 
determinate in light of the information 
given. Thirty-five minutes were allowed for 
the test. The score was the number of state- 
ments correctly marked. 


Intelligence. Problem solving and 
intelligence could be equated, pro- 
vided the latter were thought of as 
somewhat different from what is 
elicited by typical tests of intelligence. 
Several studies have attempted to 
determine the relationship between 
test intelligence and problem solving 
or reasoning with various results. 
Harootunian (1959) reviewed most of 
these investigations and _ provided 
further evidence that intelligence tests, 
at best, account for about half of the 
variance of problem solving. 

Both of the junior high schools involved 
in this study regularly give the California 
Test of Mental Maturity, Long Form, in 
seventh grade during the first month of 
school. The IQs of the Ss therefore were taken 


from the school records, those for the eighth 
grade dating back to September 1957. 


Reading. Several years ago, Thorn- 
dike (1917) observed, 


Understanding a paragraph is like solving 
a problem in mathematics. It consists in 
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selecting the right elements of the situation 
and putting them together in the right rela- 
tions .... The mind...must select, re- 
press, soften, emphasize, correlate, and 
organize, all under the influence of the right 
mental set . . . . Reading an explanatory or 
argumentative paragraph ... involves the 
same sort of organization and analytic ac- 
tion of ideas as occur in thinking of sup- 
posedly higher sorts. 


Others since Thorndike have called 
attention to the essential similarity be- 
tween reading and thinking, and 
several factor analyses have found 
reasoning as well as verbal factors in 
reading tests. 


Prior to 1958, both schools had given the 
California Achievement Tests, but Lans- 
downe-Aldan changed to the Iowa Every- 
Pupil Tests of Basic Skills. The reading 
score used in this study was simply the 
grade equivalent score on either test, as ob- 
tained from the school records. 


RESULTS 


Product-moment coefficients of cor- 
relation between all pairs of variables 
were found from the normalized stand- 
ard scores and are presented in Table 
3. The coefficients range from a low of 
.169 between ideational fluency and 
closure to a high of .732 between prob- 
lem solving and reading; all are signifi- 
cant with p < .001. 

It will be noticed that the coefficients 
indicate that the predictor variables 
differ considerably in importance. 
Ideational fluency explains the least of 
the variance of problem solving ability, 
only 3%; reading explains the most, 
54%. 

The beta coefficients, as well as the 
multiple coefficients, for the combina- 
tions of seven, six, and five predictor 
variables are shown in Table 4. The 
statistics indicate that, while problem 
recognition is not the most important 
variable, it contributes substantially to 
problem solving, as defined. When IQ 
and reading are eliminated from the 
regression equation, problem recogni- 
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TABLE 3 
COEFFICIENTS OF CORRELATION BETWEEN THE VARIABLES 
Variable 2 3 4 5 6 7 8 

1. Problem Solving .624 .417 290 .396 .707 .682 -732 
2. Problem Recognition -401 563 .294 . 552 .519 . 598 
3. Word Fluency 460 .332 .365 .403 .490 
4. Ideational Fluency .169 -281 .318 . 363 
5. Closure .272 . 362 .350 
6. Judgment 541 .637 
7. IQ .558 
8. Reading 

tion is the second best of the variables, TABLE 4 


accounting directly for about 11% of 
the problem solving variance. 

When combined with the other vari- 
ables, word fluency becomes relatively 
unimportant in accounting for problem 
solving. As a matter of fact, its beta 
coefficient is significant at the .01 level 
only when IQ and reading are dropped 
as predictors, and even then it accounts 
for only 1% of the variance. Appar- 
ently, the ability to produce words 
rapidly according to certain require- 
ments has a role in problem solving, 
but it is a very minor one. 

The negative beta coefficients in 
Table 4 for ideational fluency indicate 
that the correlation between this vari- 
able and problem solving ability is 
negative when the influence of other 
variables is statistically controlled. It 
is possible only to speculate why this 
is so. It may be that individuals who 
call up ideas in quantity do so at the 
expense of quality and tend to be un- 
able to separate the superficial from 
the important and, hence, tend to be 
poor judges of information and possi- 
ble problem solutions. Mention was 
made earlier that it is logical to suppose 
that the person capable of rapidly 
producing ideas, other things being 
equal, will solve more problems than 
one less capable. It may be that, 
experimentally speaking, other things 
are not ordinarily equal. 

Although closure contributes signifi- 


Beta CoEFFICIENTS AND MuLtTIPLe Co- 
EFFICIENTS OF CORRELATION BETWEEN 
PROBLEM SOLVING AND SELECTED 
VARIABLES 


Beta Coefficient 


Variable . 
Five 
Six Pre 


Seven 
Predictors | Predictors 


Problem Recogni-| .1830 .2590| .3369 
tion 

Word Fluency 

Ideational Flu- |—.1030 |—.1094 |—.1087 


ency 

Closure .0795 .1013 .1520 

Judgment . 2690 .3712 .4707 

IQ .2733 

Reading .3060 

(Multiple Coeffi- .8482 | .8234 7859 
cient) 


* Not significant at the .01 level. All other coefficients 
are significant with p < .01. 


cantly as a member of each combina- 
tion, the percentage of problem solving 
variance which can be attributed to it 
is very small, ranging from less than 
1% to slightly more than 2%. It would 
seem that closure has little importance 
in problem solving ability, as defined. 

Of the five variables, judgment con- 
tributes most to the variance of prob- 
lem solving. In the seven-variable 
combination it accounts for almost as 
much as IQ and reading; in the others 
it explains the most. In fact, when the 
five composites alone are considered, 
more than one-third of the explained 
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variance of problem solving can be 
attributed directly to judgment. 

The importance of intelligence, as 
tested, in explaining problem solving 
ability is evidenced by its two rela- 
tively high betas. These coefficients 
were not unexpected in light of the 
results from other investigations. Of 
particular interest, however, is the fact 
that in neither regression equation does 
IQ account for the greatest proportion 
of problem solving variance. It is 
noteworthy that the elimination of 
reading as a predictor left the beta 
coefficient for IQ relatively unchanged, 
while others increased considerably, 
particularly the betas for judgment 
and problem recognition. As a matter 
of fact, judgment seems to be just as 
important as IQ in explaining problem 
solving, if not more so. 

In the seven-variable combination, 
the greatest proportion of problem 
solving variance is accounted for by 
reading. The evidence from this analy- 
sis adds support to Thorndike’s (1917) 
conclusion that reading and problem 
solving are essentially similar proc- 
esses. It is evident that the abilities 
elicited by reading are fully as impor- 
tant in problem solving as any of the 
variables studied. 


SUMMARY 


The purpose of this study was to 
estimate the importance of problem 
recognition, word fluency, ideational 
fluency, closure, judgment, test intel- 
ligence, and reading ability in problem 
solving. The criterion variable, prob- 
lem solving, was measured by the 
composite score on the Differential 
Aptitude Tests of verbal and abstract 
reasoning, the Davis-Eells Games, and 
a test containing 40 relatively difficult 
reasoning problems. The predictor 
variables were measured by tests 
similar to those used in factor analyses 
of reasoning ability and by tests con- 
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structed for the study. Two or more 
tests were used for each variable, and 
scores were combined by the nor- 
malized standard score technique. The 
sample consisted of 513 seventh and 
eighth grade pupils in suburban 
Philadelphia. 

Multiple regression analysis indi- 
cated that the most important of the 
seven predictor variables were reading, 
test intelligence, judgment, and prob- 
lem recognition. Closure, word fluency, 
and ideational fluency made little 
independent contribution to the vari- 
ance of the criterion. Of special interest 
was the fact that, when the influence 
of other variables was statistically 
controlled, the correlation of ideational 
fluency with problem solving was 
negative. 
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THE RELATIONSHIP OF AGE TO ADULT READING SCORES 


A. W. ANDERSON 
University of Western Australia 


Of recent years greater interest has 
been taken in the problems of reading 
development at the adult level. In 
principle, at least, reading experts no 
longer incline to the view that reading 
teaching is completed at the elemen- 
tary grade and the lower high school 
levels. This article outlines a point 
which may warrant further investiga- 
tion and which could, if established 
experimentally, have considerable in- 
fluence on programs of reading devel- 
opment especially at the postelemen- 
tary school levels. 

In simple terms the problem is to 
consider the degree to which various 
aspects of reading might be regarded 
as skills, which like other skills are 
likely to deteriorate unless revised 
from time to time. Three aspects are 
considered here, namely, Vocabulary, 
Speed of Comprehension, and Level of 
Comprehension, as measured by the 
Co-operative Reading Test, Form Q, 
Lower Level Cl. 


METHOD 


The sample is an accidental sample, no 
controls being possible, and of course such a 
sample tends to limit this to an illustratory 
and exploratory study only. Since 1957 the 
writer has conducted an adult reading im- 
provement program for the Western Aus- 
tralian Adult Education Board, working 
with small groups of about 20 people, and 
2mphasizing the development of reading 
cate (Wheeler & Anderson, 1958). As part of 
the program tests were administered at the 
beginning and at the end of the course. The 
figures considered in this article relate to 
the precourse test scores. The number who 
took the postcourse test was considerably 
less than those who began the course and as 
the retesting was done in varying circum- 
stances the figures are not reported here. 
With a larger sample it is hoped at a later 
date to consider any changes in the correla- 
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tions, which may have occurred as a result 
of the course work, and the degree to which 
any age group might show greater or less im- 
provement than other age groups. Although 
asked to provide their ages, not all people 
did, and this sample consists of 171 males 
and 107 females for whom test scores and 
ages were available. The age ranges are from 
18 to 62 years for males and 20 to 58 years for 
females. As can be seen from Table 1 the 
sample is heavily biased towards the pro- 
fessional and clerical occupations and it 
would be difficult to generalize to the popu- 
lation as a whole. Females are considered as 
a composite group as it was difficult to cate- 
gorize female occupations, most females list - 
ing their occupation as ‘“‘housewife.”’ 


RESULTS 


Tables 1, 2, and 3 show the reading 
scores of the males for subsections and 
total male group and for the female 
group as well as the intercorrelations 
between age and reading subscales. 
Considering first the male group, it 
may be seen that the Vocabulary score 
correlates positively with Age, but that 
the Speed of Comprehension and Level 
of Comprehension scores correlate 
negatively and that this relationship 
holds in the two main male subgroups 
of professional and clerical. The female 
group shows a similar trend, but the 
negative correlations are not signifi- 
cant. 


DIScuUSSION 


No intelligence measure was availa- 
ble, so that even though intelligence is 
known to correlate with reading, it is 
not possible to estimate any relation- 
ships in this study. 

The main point of issue seems to be 
that the Vocabulary scores in the 
sample group increase with age while 
scores in Speed and Comprehension are 
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TABLE 1 
AGE AND READING ScoREs OF WESTERN AUSTRALIAN ADULT SAMPLE 
(Co-operative Reading Test, Form Q, Lower Level) 

Total Readi 
Age Vocabulary Speed Level Seating 

M SD M SD M SD M SD M SD 

Professional males| 59 | 31.9 | 10.9 | 68.0| 9.3 | 57.8 | 10.0 | 58.7 | 9.2 | 62.1 | 8.5 
Clerical males 75 | 36.8 | 10.8 | 68.2} 8.8 | 56.4] 10.6 | 57.5 | 8.0 | 8.9 
Other males 37 | 38.4 | 12.7 | 66.3 | 11.7 | 49.7 | 9.9 | 52.6 | 9.2 | 56.2 | 9.9 
All males 171 | 35.4 | 11.5 | 67.7 | 9.7 | 55.4) 10.7 | 56.9 | 8.9 | 60.4] 9.3 
All females 107 | 37.9 | 13.1 | 67.7 | 10.6 | 52.4 | 10.6 | 54.7 | 9.5 | 58.6 | 9.6 

Note.—The speed score is for Speed of Comprehension, not merely rate of reading. 
likely to show a decrease. The writer TABLE 2 


suggests tentatively that this might 
possibly illustrate a difference between 
knowledge and skill. Vocabulary devel- 
opment may be regarded as the devel- 
opment of word knowledge, and as 
people grow older their experience of 
words will broaden. Thus one might 
expect vocabulary to correlate posi- 
tively with age, if for no other reason 
than that older people have been ex- 
posed to more words than younger 
people of comparable ability. On the 
other hand, rate and comprehension 
may be dependent on skills, and the 
greater the time lapse after reading 
training in school, the poorer the level 
of achievement. 

In Australian schools reading train- 
ing has hitherto been restricted to the 
elementary school and few, if any, of 
the adults in this group have had any 
reading instruction beyond the age of 
about 12 years (the end of elementary 
schooling). Australian education has 
tended to emphasise word knowledge 
rather than reading skills, and there is 
evidence to indicate that while Aus- 
tralian University students are superior 
to their American counterparts in 
Vocabulary (Anderson, 1957) they 
score lower in reading skills. It seems 
probable therefore that while vocab- 
ulary continues to develop, reading 


CORRELATIONAL MATRICES FOR ALL MALES 
AND ALL FEMALES 


Age Vocabulary Speed Level 


Age 35* —.15 —.05 

Vocab- .29* .59* .70* 
ulary 

Speed —.25* .53° .84* 

Level —.33* .45* -80* 


Note.—Correlations for females are above the dia- 
gonal; correlations for males are below the diagonal. 
< &. 


TABLE 3 


CORRELATIONAL MATRICES FOR PROFES- 
SIONAL AND CLERICAL MaLe Groups 


Age Vocabulary Speed Level 


Age .31* —.29* —.47** 

Vocab- .26* 
ulary 

Speed —.22* .50** .64** 

Level —.28* .45** .83** 


Note.—Correlations for the Professional Group are 
above the diagonal; correlations for the Clerical Group 
are below the diagonal. 

*p < 05. 

"p< Ol. 


skill is likely to decline with age, after 
formal schooling ceases. This would be 
even more so in the case of the sample 
used in this study as the male profes- 
sional and clerical groups are likely to 
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be engaged in restricted critical reading 
tasks without a leavening of general 
reading. It appears that one might 
make out a reasonable case for a regu- 
lar program for the revision of reading 
skills beyond the period of formal 
education, and aimed more at the 
retention of skills than at the develop- 
ment of vocabulary. 


SUMMARY 


This article reports the reading 
scores of a sample of Western Austral- 
ian adults and suggests the possibility 
that although word knowledge appears 


A. W. ANDERSON 


to increase with age, reading skills re- 
quired in speed and comprehension 
may deteriorate with age unless some 
regular corrective practice is carried 
out. However, because of the sampling 
problem the results can only suggest an 
area of further study. 
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VERBAL AND IDEATIONAL FLUENCY IN SUPERIOR 


TENTH GRADE STUDENTS! 
CARL BEREITER? 


University of Wisconsin 


Among the creative thinking abilities 
that have been identified by Guilford 
and others (Guilford, 1957a; Wilson, 
Guilford, Christensen, & Lewis, 1954), 
fluency abilities are clearly the most 
accessible to objective measurement. 
Dealing as they do only with the quan- 
titative aspect of creative thinking— 
how many ideas, solutions, and the 
like can be produced—they are perhaps 
not as central to the study of creative 
processes as are such abilities as origi- 
nality and flexibility, but they repre- 
sent what is probably the most attain- 
able beachhead for an attack on this 
difficult domain. 

The study of fluency has been lim- 
ited almost entirely to fluency in verbal 
performance, but the concept of 
fluency can readily be extended to 
other kinds of intellectual performance, 
as has been done by Guilford in his 
“Structure of Intellect”’ (1956, 1957b). 
In the fields of art and design, for 
instance, fluency would be manifested 
in the ability to produce many con- 
figurations or designs. In mathematics, 
engineering, and architecture, it would 
be manifested in facility in producing 
or applying formal structurings of ele- 
ments (as in the ability to find many 


1 Paper presented at the American Edu- 
cational Research Association, Atlantic 
City, February 1960, based on the writer’s 
doctoral dissertation, School of Education, 
University of Wisconsin. The writer is in- 
debted to the members of his thesis com- 
mittee, Thomas A. Ringness (Chairman), 
Chester W. Harris, and E. James Archer and 
to Herbert J. Klausmeier and Robert 
Fischer for their assistance in carrying out 
this study. 

2 Now at Mary Conover Mellon Founda- 
tion, Vassar College. 
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situations fitting a given mathematical 
model or the ability to plan many room 
arrangements within a single building 
shell). 

Probably the most definitive study 
to date of verbal fluency abilities has 
been that by Guilford and Christensen 
(1956), carried out on adult males. In 
light of the current interest in early 
identification of creative talent and in 
the utilization of female talent, it 
seemed desirable to attempt a replica- 
tion of the results of this study using 
younger Ss and to explore the possi- 
bility of sex differences in the pattern- 
ing of fluency abilities (something 
which no previous study in this area 
has done). 

The above considerations led to the 
formulation of three purposes for the 
present study: to investigate fluency 
in the use of nonverbal materials, to 
compare the verbal fluency factors ob- 
tained with younger Ss with those 
obtained with adults, and to compare 
fluency factors obtained with boys with 
those obtained with girls. 


METHOD 


Tests. A battery of 18 tests was assembled, 
including both reference tests for previously 
identified verbal fluency factors and new 
tests designed to tap areas of nonverbal 
content. Table 1 provides brief descriptions 
of these tests. The first 10 are tests that con- 
tributed most to the identification of verbal 
fluency factors in the Guilford and Christen- 
sen (1956) study.* Certain of these tests were 
modified to adapt them to younger Ss (Be- 


3 As recently revised (Guilford, Fruchter, 
& Kelley, 1959), the ‘“‘Structure’’ would lead 
to somewhat different hypotheses, but the 
present study was under way before this re- 
vision appeared. 


d 
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TABLE 1 
DESCRIPTIONS OF REFERENCE AND EXPERIMENTAL FLUENCY TESTS 
Hypoth- 
Test Task = 
Content 
1. Word Fluency, Form A* Write words containing one specified letter. WF» 
2. Suffixes Write words containing a specified suffix. WF 
3. Controlled Associations | Write synonyms for given word. AF 
Ill 
4. Simile Insertions Produce attributes that two given cbjects have | AF 
in common. 
5. Plot Titles Write titles for story plots. IF 

6. Brick Uses List different uses for a brick; score is number IF 

listed. 
7. Object Naming* Write thing names fitting somewhat restricted IF 
classes. 
8. Expressional Fluency, | Write four-word sentences; first letter of each EF 
Form A* word is given. 

9. Word Arrangement Write sentences containing four specified EF 
words. 

10. Simile Interpretation Compose more or less complete expressions of | EF 
the attributes two objects have in common. 

11. Product Design Draw designs for car grilles and lampshades, FIF 
outlines of car fronts and lamp bases being 
supplied. 

12. Design Synthesis Draw different designs using three given fig- FIF 
ures. 

13. Alphabet Design Design possible new letters for the alphabet. FIF 

14. Form Completion Name objects that could be drawn by adding | FIF 
lines to given figures. 

15. Linkages Draw devices for connecting Objects A and B SIF 
so that when A is moved in an indicated di- 
rection, B will move in an indicated direction 

16. Partitions Draw different ways to separate objects into SIF 
pairs by the use of a limited number of 
straight lines. 

17. Connections Draw lines connecting specified objects with- SIF 
out one line crossing another. 

18. Structural Functions Produce (verbally) ideas based on the formal SIF 


relationships between objects; e.g., places to 
hide a rope, tasks suitable to an 8-foot tall 


person. 


reiter, 1959), but the changes were minor 
enough that, except possibly in Test 7, no 
changes in factorial composition were be- 
lieved likely. In that test new classes of ob- 
jects were used which appeared freer of 


* Published by Sheridan Supply Co., Beverly Hills, Calif., Copyright 1957. 


> The following abbreviations are used 


© Based on Thing Listing II. 


WF Word Fluency 
AF Associational Fluency 
IF Ideational Fluency 
EF Expressional Fluency 
FIF Figural Ideational Fluency 
SIF Structural Ideational Fluency 


shape of objects. 


figural or structural content: ‘‘old-fash- 
ioned”’ and “‘dangerous’’ replaced original 
items dealing with use, composition, or 


The remaining eight tests are new tests 
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developed on the basis of hypotheses sug- 
gested by Guilford’s ‘‘Structure of Intel- 
lect’ (1956, 1957b).* This scheme suggested 
the existence of two previously unidentified 
factors which may be labeled Figural Idea- 
tional Fluency and Structural Ideational 
Fluency. Figural Ideational Fluency was 
interpreted broadly to involve such tasks 
as thinking of ideas for pictures, completing 
pictures, varying designs, or recombining 
elements into various designs. Structural 
Ideational Fluency was interpreted as in- 
volving the production of formal systems as 
opposed to concrete figures or substantive 
ideas. Mathematics provides the most ob- 
vious example of such content, but it was 
found impossible to invent divergent pro- 
duction (as opposed to single-solution) 
items of mathematical content which were 
sufficiently easy. Instead, tests involving 
mechanical and spatial relationships were 
devised, on the premise that in such tests 
it is formal relationships between objects 
which are critical rather than objects them- 
selves. A fuller description of the experi- 
mental tests and an account of their devel- 
opment is given in Bereiter (1959). 

Subjects. The 18 tests were administered 
to a total of 265 tenth grade Ss, 103 boys and 
162 girls, in three urban Wisconsin high 
schools. All had been identified by their 
schools as academically superior and were 
enrolled in special classes for such students. 
One hundred twenty-eight of the Ss were 
volunteers who came to two weekend test- 
ing sessions. The rest were selected by their 
schools for testing during regular school 
hours. 

Method of Analysis. Separate factor 
analyses were performed for boys and girls, 
using Rao’s (1955) canonical factor analysis 
method and Lawley’s (1940) test for the sig- 
nificance of residuals, as programed for the 
IBM Type 650 computer by Harris and 
Pierce (1956). Orthogonal normal varimax 
rotations were made of the canonical fac- 
tors.§ 


RESULTS 


Nine canonical factors significant at 
the 5% level were extracted from the 
correlation matrix for boys and six 


4 The writer wishes to express his grati- 
tude to J. P. Guilford for permission to 
adapt and use these tests. 

‘The writer is indebted to Henry F. 
Kaiser, originator of the varimax method, 
for carrying out these rotations. 


from the matrix for girls. After rota- 
tion, however, six factors that could 
be regarded as common factors re- 
mained for both sexes. These rotated 
factors are reported in Tables 2 and 3. 

As a basis for matching factors ob- 
tained for boys with factors obtained 
for girls, a least squares approximation 
of the former factor matrix to the 
latter was carried out (Bereiter, 1959, 
pp. 69-70, 124-125), the elements of 
the transformation matrix indicating 
the contribution of each factor to the 
approximation. In the following factor 
interpretations, factors for boys and 
girls are considered in pairs wherever 
a clear matching was indicated. The 
convention of treating loadings with 
absolute values of .30 or higher as 
“significant” has been followed. 

Factors A and M. Factor A (girls) 
resembles Factor M (boys) in having 
substantial loadings on verbal tests 
that involve the production of fairly 
commonplace, low level ideas. These 
tests lend themselves to a routine, 
“grinding out”? method for obtaining 
a high score. They differ, however, in 
that Factor M (boys) has its highest 
loading on a drawing test, Design 
Synthesis, which has an insignificant 
loading on Factor A (girls). Of the 
nonverbal tests, Design Synthesis is 
the most suited to a grinding out of 
low level productions, but it remains 
a question why its loadings should be 
so different on the two factors. 

Two explanations may be suggested. 
One is that girls may respond quite 
differently to verbal content than they 
do to nonverbal content but that boys 
do not show this distinction—a hy- 
pothesis that will be seen to gain sup- 
port from other factor comparisons as 
well. The other is that the high loading 
of Design Synthesis on Factor M may 
be an incidental consequence of con- 
ditions of test administration. Plot 
Titles, Design Synthesis, and Struc- 
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ORTHOGONAL VARIMAX ROTATION OF CANONICAL Factor PATTERN FOR 18 
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TABLE 2 


Fiuency Tests Given To 162 TentH Grape GIRLS 


Factors 
Tests 
A B Cc D E F 
1. Word Fluency .09 | —.14 .24 44 12 .23 
2. Suffixes .07 ll .08 61 .07 05 
3. Controlled Associations .32 14 .24 14 61 .07 
4. Simile Insertions 34 -28 ll 04 
5. Plot Titles .69 .23 .08 .08 
6. Brick Uses .53 .05 21 15 .34 .32 
7. Object Naming -25 49 13 .38 
8. Expressional Fluency .22 | —.07 .62 -20 
9. Word Arrangement .26 13 .70 .06 .00 13 
10. Simile Interpretation .67 -03 29 .01 17 -09 
11. Product Design .30 47 .29 | —.01 ll .38 
12. Design Synthesis .28 .25 .08 05 
13. Alphabet Design 14 -62 .23 -09 | —.05 -28 
14. Form Completion -20 34 36 
15. Linkages 13 34 .05 01 .28 .19 
16. Partitions 05 .20 17 | —.26 12 
17. Connections — .06 .56 | —.05 | —.02 .08 01 
18. Structural Functions .60 .14 16 .20 .34 
TABLE 3 
ORTHOGONAL VARIMAX RoTATION OF CANONICAL FacToR PATTERN FOR 18 
Fivency Tests Given To 103 TentH Grape Boys 
Factors 
Tests 
M N Oo P Q R 
1. Word Fluency .16 .59 |} .10 .02 .04 
2. Suffixes .02 24 .06 04 54 
3. Controlled Associations 21 .52| .26 .24 | —.02 
4. Simile Insertions .35 .33 56 -26 | —.08 05 
5. Plot Titles .70 2] .2 -ll | —.16 04 
6. Brick Uses .48 .10 31 .38 .06 .20 
7. Object Naming .24 .32| .23 -16 -28 01 
8. Expressional Fluency .09 51 | .06 .27 12 
9. Word Arrangement .06 .58 | .23 05 
10. Simile Interpretation .42 .36 | —.27 
11. Product Design .38 .25 .62 .05 .09 .02 
12. Design Synthesis .74 .00 | .20 .03 15 ll 
13. Alphabet Design .12 13} .7 .09 14 16 
14. Form Completion 35 .19 12 
15. Linkages 24 01 .14| —.01 51 
16. Partitions .02 | —.04]| .12 .02 | —.09 .59 
17. Connections 13 13 .09 — .02 .00 ll 
18. Structural Functions .68 .20 .09 .32 .10 .09 


Note.—Three factors, each with only one loading of .30 or more in absolute value, are not reported. The three sig- 
nificant loadings were .55 on Test 17, .30 on Test 8, and .38 on Test 7. 
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tural Functions were the last three 
tests administered in the battery, and 
their intercorrelations were the highest 
ones obtained for boys. This suggests 
the operation of some motivational or 
fatigue factor which was not equally 
effective on boys and girls. In spite of 
this discrepancy, the title Production 
of Low Level Ideas appears to fit the 
composition of both factors. 

Factors C and N. Factor N is the 
only strong factor for boys determined 
entirely by verbal tests. It is quite 
undifferentiated, containing reference 
tests of all four of the verbal fluency 
factors identified by Guilford and 
Christensen (1956). It therefore ap- 
pears appropriate to label it, following 
Zimmerman (1953), Verbal Fluency, 
indicating a general fluency in the use 
of words or phrases. Factor C (girls) 
resembles Factor N except that the 
tests determining it are limited to ones 
involving the use of words in meaning- 
ful contexts. The two tests with highest 
loadings are reference tests of Expres- 
sional Fluency, and the factor appears 
to fit French’s (1951, p. 209) descrip- 
tion of that factor as the “ability to 
think rapidly of the wording for ideas.” 

Factors D and Q. The only tests 
having significant loadings on Factor 
D (girls) are Suffixes and Word Flu- 
ency, tests for the familiar Word 
Fluency factor. Factor D may thus 
be confidently identified as Word Flu- 
ency, which French (1951) describes 
as “entirely limited to the speed of 
producing any words which fit certain 
mechanical restrictions regarding the 
letters or affixes used”’ (p. 249). Factor 
Q (boys) most nearly approximates 
this factor, but it is specific to the 
Suffixes tests, so that nothing more 
can be said of it than that it “suggests” 
a Word Fluency factor. 

Factor E. This factor for girls is de- 
termined mainly by Controlled Asso- 
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ciation, but the significant loadings on 
Object Naming, Form Completion, and 
Brick Uses, suggest an underlying task 
similarity. In all four tests S is pre- 
sented with a stimulus (word or pic- 
ture) to which she associates verbally, 
and there is a certain indefiniteness as 
to what constitutes an appropriate 
response. Variance on these tests may 
thus arise not only from differences in 
command of words but also from differ- 
ences in the looseness or rigor with 
which Ss interpret the given restric- 
tions. On this basis, the factor may be 
identified with the factors of Associa- 
tional Fluency that have appeared in 
personality studies (cf. Cattell, 1953, 
pp. 193-204). 

Factor P. This factor for boys has 
some similarity of pattern to Factor 
E, above, but not enough to justify 
matching them. It is also similar to 
another factor for boys, Factor M, 
which was identified as Production of 
Low Level Ideas; all four of the tests 
having significant loadings on P also 
have significant loadings on M. Be- 
cause of this ambiguity, the factor 
must be left unidentified, but it is 
suggested that it may represent some 
component of low level idea produc- 
tion that is accounted for by freedom 
in associating to stimuli. 

Factors B and O. Factor B (girls) is 
loaded by four nonverbal tests, two 
of which were intended to have figural 
content and two of which were 
intended to have structural content. 
The most obvious characteristic which 
they have in common is that they re- 
quire S to devise some configuration 
out of nothing, so to speak, as opposed 
to Design Synthesis and Partitions, 
in which S is told what figures to use. 
On this basis the factor may be identi- 
fied as Figure Production. 

Factor O (boys) is loaded by two 
tests involving figure production, but 
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also by the two simile completing tests. 
This combination suggests that a rele- 
vant variable might be the esthetic 
or “arty” character of the tests—the 
one kind having to do with designs, 
the other with figures of speech. It is 
interesting to note in this connection 
that Simile Interpretation, as well as 
a simile completing test not greatly 
different from Simile Insertions, had 
significant loadings on Originality in 
the Guilford and Christensen (1956) 
study. Factor O may therefore be 
some kind of esthetic aptitude or orig- 
inality factor whose nature cannot 
be more clearly described because suit- 
able measures of such factors were not 
included in the study. 

Factor F. This factor for girls seems 
to be a logical complement to Factor 
B (girls); whereas B is determined by 
tests in which figures must be produced 
out of nothing, F is determined by 
tests in which figural elements are 
supplied. In Design Synthesis and 
Partitions, the tests with highest load- 
ings on the factor, the elements enter- 
ing into the designs are completely 
specified and the task is simply one 
of placing or arranging the elements. 
To a lesser extent this is true even of 
the three verbal response tests loading 
the factor—Form Completion, Struc- 
tural Functions, and Brick Uses. They 
seem to involve ideas about the place- 
ment or arrangement of elements. 
There is a suggestion here of structural 
content, but because of the factor’s 
complementary relationship to Factor 
B, it seems more appropriate to iden- 
tify it as Figure Manipulation. 

Factor R. Factor R (boys) is a dou- 
blet composed of two of the nonverbal 
tests included to measure Structural 
Ideational Fluency. A doublet, based 
as it is upon a single correlation co- 
efficient, is weak evidence on which to 
base a new factor identification; but, 
assuming the correlation not to be 


spurious, the factor clearly fits the 
description of Structural Ideational 
Fluency as a fluency in producing 
mechanical and spatial ideas—Parti- 
tions being a spatial test and Linkages 
being mechanical. 


Discussion 


The results of the present study are 
dominated by sex differences which 
are so sweeping that any matching of 
factors for the two sexes is tenuous. 
The most obvious general sex differ- 
ence is that the factor structure is 
much less clear for boys than for girls. 
This may be only an effect of the selec- 
tion of tests, however; for if different 
dimensions do exist for the two sexes, 
it may merely be that the tests chosen 
were more appropriate for isolating 
dimensions for the girls. 

In the verbal area the factors for 
boys show signs of immature develop- 
ment—a general factor plus rudiments 
of other verbal factors. In the idea 
producing area, however, the situation 
may be reversed. The idea producing 
factors for girls seem to be differentia- 
ted on rather simple-minded bases. 
One involves verbal tests and the other 
two involve nonverbal tests which are 
differentiated according to whether de- 
sign elements are supplied or whether 
they must be made up. It thus appears 
that the more concrete aspects of the 
tasks are what matter with girls and 
that more abstract aspects of test 
content make less difference. For boys, 
on the other hand, the concrete aspects 
seem of little importance: verbal and 
nonverbal tests share high loadings 
on several factors. Instead, the ab- 
stract aspects seem to be the bases 
for differentiation of factors—the kinds 
of ideas required, whether original or 
routine, esthetic or commonplace, fig- 
ural or structural. 

The possibility that these differences 
are due to sampling error or bias must 
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be considered. Estimating the standard 
error of a factor loading remains an 
unsolved problem. Ns of 103 and 162, 
as used in the present study, yield 
quite reliable correlation coefficients, 
however, so that it seems unlikely that 
sampling error could account for such 
extensive differences. Bias in the selec- 
tion of Ss is an even less likely explana- 
tion. Bias might be present; criteria 
for placing boys and girls in special 
classes, while ostensibly the same, 
might be somewhat different in prac- 
tice. Such biases could do mischief in 
the comparison of mean scores, but it 
would take an extraordinarily biased 
selection to yield groups in which the 
correlations between variables were 
atypical. It therefore seems warranted 
to look for some psychological explana- 
tion for the sex differences. 

At a low level of inference, these 
findings say simply that the tests used 
in the present study did not measure 
the same things for boys and girls. 
Any suggested explanation of this 
difference must necessarily go well 
beyond the data, but the following 
hypothesis is advanced as one that 
predicates a minimum of discontinuity 
in basic mental organization between 
the sexes. One aspect of fluency that 
would seem to be especially significant 
among people of relatively high gen- 
eral mental ability is one related to 
inhibition. Among intelligent Ss, those 
who perform best on a_ particular 
fluency test are likely to be ones who 
are least intimidated, bewildered, or 
otherwise inhibited by the nature of 
the test. The impression that the writer 
acquired while administering these 
tests, and one that should not surprise 
teachers of tenth grade children, was 
that the girls responded with much 
more emotion to the immediate and 
superficial aspects of each test than 
did the boys—with greater extremes 
of delight, despair, bewilderment, en- 
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thusiasm, or indignation as the case 
might be. It is possible that these initial 
reactions had an influence on per- 
formance that was only partly offset 
by the effect of the less obvious intel- 
lectual content of the tests. Thus tests 
that were superficially alike—e.g., ones 
involving drawing figures, ones involv- 
ing arranging figures, ones involving 
listing words—tended to be more 
highly correlated with each other than 
tests that tapped the same ability but 
in different ways. (This is similar to 
the heterotrait-monomethod versus 
monotrait-heteromethod distinction 
discussed by Campbell and Fiske, 
1959. Boys, on the other hand, being 
less affected by the immediate im- 
pressionistic aspects of the test, may 
have revealed more individual differ- 
ences in ability to handle particular 
kinds of intellectual content. If this 
hypothesis is sound, then it would be 
predicted that sex differences in factor 
structure would tend to diminish with 
increasing age, as girls become less 
emotionally reactive to such things as 
tests. 

The prevalence of sex differences 
obscures results bearing on the other 
two concerns of this study—the exist- 
ence of fluency factors with nonverbal 
content and the comparability of ver- 
bal fluency factors in younger Ss with 
those obtained in studies of adults. 
Verbal and nonverbal tests identified 
different factors for girls, but the hy- 
pothesized Figural and Structural 
Ideational Fluency factors were not 
discernible. For boys, on the other 
hand, there was no clear separation 
of verbal and nonverbal content, but 
a weak Structural Ideational Fluency 
factor could be distinguished. All that 
may be said in general of the results 
is that the inclusion of nonverbal tests 
led to the appearance of more fluency 
factors than had appeared in studies 
limited to verbal tests, but that the 
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nature of these additional dimensions 
remains unclear. 

In contrast to recent studies using 
adult males (Guilford & Christensen, 
1956), the present analysis of verbal 
fluency abilities in tenth grade boys 
yielded not four verbal fluency factors 
but only a single general one. For girls, 
factors appeared which could be identi- 
fied with the four factors previously 
obtained. Except for Word Fluency, 
however, the factors differed from 
those obtained by Guilford and Chris- 
tensen in a number of ways that can 
be only briefly summarized here. Ex- 
pressional Fluency appeared to be 
more general than its counterpart in 
the Guilford-Christensen study, in- 
volving the whole meaningful use of 
language rather than just the use of 
phrases and larger units of expression. 
Associational Fluency emerged as a 
factor more closely related to the con- 
cept of association as it is used with 
clinical tests, implying a facility in 
associating to a stimulus, rather than 
an ability to use words meaningfully, 
as had been implied by Guilford and 
Christensen. In the verbal ideational 
fluency factor, the ideas involved were 
of the type generally thought of as 
ideas—solutions to problems, simple 
inventions, etc.—rather than ideas in 
the sense of elements in logical cate- 
gories, as in Guilford and Christensen’s 
definition. 

Expressional Fluency is the only 
factor for the girls which suggests less 
mature intellectual development than 
the corresponding factor for adults. 
The other factors are definable in 
simpler terms and are, we would argue, 
at least as meaningful and well-defined 
as the factors obtained in other studies. 
Before further research can resolve 
these differences and proceed to clarify 
dimensions of fluency involving non- 
verbal tests, it would appear essential 
to explore further the extent and sta- 
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bility of the sex differences revealed 
in this study. 


SUMMARY 


A battery of 18 tests, consisting of 
10 reference tests for verbal fluency 
factors and eight new tests designed 
to measure hypothesized factors of 
fluency in the production of figural and 
mechanical and spatial ideas, was ad- 
ministered to 103 male and 162 female 
academically superior tenth grade stu- 
dents. Factor analyses by a maximum 
likelihood method and analytical ro- 
tations were carried out on data for 
boys and girls separately. 

Quite different factor patterns were 
obtained for the two sexes. For girls 
the reference factors of verbal fluency— 
Word Fluency, Associational Fluency, 
and Expressional Fluency—were sub- 
stantially replicated, but for boys only 
a general verbal fluency factor ap- 
peared. Three idea producing factors 
emerged for both sexes. For girls the 
characteristics which appeared to de- 
termine the factorial composition of 
the ideational tests were (a) whether 
they involved writing or drawing and 
(b), in the case of drawing tests, 
whether the elements used in the draw- 
ings were specified or left to the sub- 
ject’s improvisation. For boys the 
critical characteristic appeared to be 
the nature of the ideas involved. One 
factor was loaded by tests calling for 
the production of commonplace ideas, 
another by tests involving more es- 
thetic ideas, and a final factor was 
loaded by tests of mechanical and 
spatial content. The verbal factors ob- 
tained for girls suggested certain sim- 
plifications in the definition of pre- 
viously identified factors. The factors 
obtained in the area of nonverbal idea- 
tional fluency abilities indicated that 
such a domain of fluency abilities does 
exist and is accessible to measurement; 
but the important dimensions of that 
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domain did not appear to have been 
isolated. 
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THE IDENTIFICATION OF CR 
ELLEN V. PIERS, JACQUELINE M. DA 


The subject of creativity, so long 
neglected by psychologists, is currently 
being revived. Some investigators, Bar- 
ron (1958), for example, are working 
with prominent adults in a variety of 
fields to try to determine aspects of 
the creative personality. Others (Guil- 
ford, Kettner, & Christensen, 1954, 
1956; Guilford, Wilson, & Christensen, 
1952) are using factor analytic methods 
in an attempt to validate the construct 
and explore its generality. Guilford and 
his co-workers have devised some novel 
types of tests (most of which are still 
unpublished) from which promising 
factors have been identified. Two fac- 
tors which have emerged consistently, 
and which Guilford considers to be 
most useful in identifying persons with 
creative talent, are Originality, cur- 
rently defined as the production of 
uncommon, clever or remote responses, 
and Ideational Fluency, defined as the 
speed of calling up ideas, independent 
of their quality, in a situation in which 
there is relatively little restriction. 
Guilford places more weight on Origi- 
nality. 

Drevdahl (1956), in a study with 
college students rated by faculty mem- 
bers on a seven-point scale of 
creativity, found that Guilford’s Orig- 
inality was one of the measures which 
significantly differentiated those above 
and below the mean on the creativity 
rating. 

Guilford (1950) has stated that crea- 
tivity is more than intelligence, and 
cannot be accounted for adequately 
in terms of IQ. He has also stated 
that one of the most important aspects 
of the problem is the discovery of 
creative promise in our children and 
youth. 
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EATIVITY IN ADOLESCENTS 
NIELS, anv JOHN F. QUACKENBUSH 


Pennsylvania State University 


The present study is an attempt 
(a) to assess the usefulness of some 
of Guilford’s experimental tests with a 
younger age group, (b) to report sex 
and grade differences on these tests, 
and (c) to determine the degree to 
which the tests agree with teacher 
ratings of creativity. 

Getzels and Jackson (1958) at- 
tempted to differentiate the highly 
creative from the highly intelligent 
adolescent by means of five creativity 
measures taken or adapted from Guil- 
ford and Cattell, and then to compare 
the two groups on various measures. 
In the present study, however, all of 
the sample will be of above average 
intelligence, in order to try to distin- 
guish differences in creativity within 
this group. 


METHOD 


Subjects. Ss were 114 seventh and eighth 
grade students from a junior high school. 
They ranged in age from 11 years, 2 months 
to 14 years, 1 month. Fifty-seven of the Ss 
constituted the two top seventh grade 
classes and 57 constituted the two top eighth 
grade classes, which had been ability 
grouped by a combined ranking on intelli- 
gence and achievement test scores. Recent 
1Qs from the Otis Intelligence Test were 
available from the school records of 110 Ss. 
Table 1 shows the mean ages and IQs of the 
sample according to sex and grade. The dis- 
crepancy between the mean IQ of the sev- 
enth and eighth grades could not be ex- 
plained by the school. 

Measures. Guilford Tests of Creativity: 
Brick Uses requires the S to give as many 
uses for a brick as he can in 10 minutes. An 
ideational fluency score is obtained from the 
number of acceptable responses. Conse- 
quences requires the S to give as many 
answers as he can in 2 minutes to each of 
four questions, such as “‘What would hap- 
pen if all books were destroyed?’’ Direct or 
immediate consequences constitute the low 
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quality or fluency score, while indirect con- 
sequences constitute the remoteness or 
originality score. Plot Titles requires that 
the S give as many titles as he can in 3 min- 
utes to each of four short plots. Each re- 
sponse is rated as ‘‘clever’’ or ‘‘nonelever”’ 
by judges. Nonclever responses constitute 
the low quality or ideational fluency score, 
while clever responses constitute the origi- 
nality score. Quick Responses requires the 
S to respond with the first word that comes 
to his mind, to a list of 50 words read at the 
rate of 5 seconds per word. An originality 
score is obtained by ranking the response 
words in terms of their frequencies and 
assigning weights to each word, then sum- 
ming the weights for all of the S’s responses. 
Unusual Uses requires the S to give as many 
unusual uses as he can for each of six com- 
mon objects, in two 5-minute periods. An 
originality score is derived by summing the 
number of acceptable responses. 

Teacher Rating Scale: A five-point rating 
scale ranging from ‘extremely creative’’ 
to “extremely uncreative’’ was developed 
to obtain teacher judgments. In order to 
establish the same frame of reference among 
the teachers, definitions of each of the five 
rating categories and a list of several ex- 
amples of behavior, which (according to the 
theories of Guilford and others) would fit 
the criterion, were given to them with the 
rating scale, along with the following defini- 
tion adapted for purposes of this study: 

Creativity is the capacity of the indi- 
vidual to avoid the usual routine, conven- 
tional ways of thinking and of doing 
things and to produce a quantity of ideas 
and/or products which are original, novel, 
or uncommon and which are workable. It 
must be purposeful or goal directed. It 
may involve the forming of new patterns 
and combinations of information derived 
from past experience, and the transplant- 
ing of old relationships to new situations, 
or the generation of new relationships. 

Procedure. Each of the four classes of Ss 
was tested for two 50-minute periods on the 
same day, in order to prevent out-of-class 
discussion. Each class was tested one period 
by each of the two examiners. Teachers were 
not present. With a few exceptions, the Ss 
took the study seriously, and appeared to 
do their best. 

Art, mathematics, English, social science, 
and science were considered the most ap- 
propriate classes from which to select the 
teachers. Most of the teachers chosen taught 
at least two of the areas, so that three 
teacher ratings were obtained for each S. 
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TABLE 1 


Ace IQ or Junior HicH SamMpLe 
AccoRDING TO SEX AND GRADE 


w | | SD Mess | sp 
Mo. 
Grade 7 
Males 31 | 148.2 | 3.2 | 119.4 
Females 24 | 147.8 | 4.1 | 118.2 
Total 55 | 148.0 | 3.6 | 118.9 | 7.5 
Grade 8 
Males 21 | 160.1 | 4.2 | 132.7 
Females 33 | 159.4 | 4.0 | 136.3 
Total 54 | 159.7 | 4.1 | 134.9 |10.0 
Total Sam- 
ple 
Males 52 | 153.0 | 6.9 | 124.8 
Females 57 | 154.5 | 7.0 | 128.8 
Total 109°; 153.8 | 7.0 | 126.9 |11.9 
*5 cases not included in these calculati b 


of incomplete information. 


These ratings were summed and coded for 
ease in computation. The coded score was 
the Teacher Rating. 


RESULTS 


In an effort to increase the reliability 
of the scoring, each of Guilford’s tests 
was scored jointly by two of the in- 
vestigators, and in the case of Quick 
Responses and Plot Titles, by all three 
investigators. 

Seven scores were obtained from 
these tests. Unusual Uses, Quick Re- 
sponses, Remote Consequences, and 
Clever Plot Titles represented the Orig- 
inality factor. Brick Uses (Fluency), 
Immediate Consequences, and Non- 
clever Plot Titles represented the Ide- 
ational Fluency factor. 

Table 2 shows the means and stand- 
ard deviations for the creativity tests 
for this sample, as compared with one 
sample of Air Cadets used by Guilford. 
Inspection of the table reveals sub- 
stantially the same types of dispersion 
for the two samples. As would be ex- 
pected, the younger age groups are 
not quite as productive, except for the 
low quality (nonclever) Plot Titles. 
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TABLE 2 


MEANS AND STANDARD DEVIATIONS OF ScoRES ON GUILFORD TESTS FOR 
Two SaMpPLes 


Guilord Sample Jong High Same 
N Mean sp | Reliabil- | Mean 

Consequences 

Remoteness score 410 10.3 4.7 .66 7.3 4.2 .68 

Low Quality score 410 14.9 6.2 .68 13.5 4.3 .50 
Plot Titles 

Cleverness score 364 6.4 3.7 .43 3.4 2.8 .73 

Low Quality score 364 12.2 6.9 -70 19.0 7.9 .90 
Quick Responses 410 99.8 18.7 81 85.8 23.4 .84 
Unusual Uses 204 15.4 5.2 .68 10.8 3.8 .66 


* All alternate form estimates except Quick Responses, which is an odd-even estimate. 


> All odd-even estimates. 


TABLE 3 
GrabE DIFFERENCES ON GUILFORD TESTS AND TEACHER RATINGS 
(N = 114) 
Grade 7 Grade 8 
Mean SD Reliability Mean SD Reliability 
Consequences 
Remoteness score 6.0 3.8 .62 8.4 3.8 .68 
Low Quality score 13.4 4.4 -59 13.6 4.5 -40 
Plot Titles 
Cleverness score 2.8 2.2 56 4.0 2.1 .78 
Low Quality score 17.7 5.8 .87 20.2 7.0 -92 
Brick Fluency 18.1 5.3 _ 19.8 7.0 _ 
Quick Responses 82.2 23.1 .82 89.3 23.7 .86 
Unusual Uses 9.4 3.2 .45 12.0 4.0 .73 
Teacher Ratings 6.1 1.6 6.2 1.9 


Since alternate forms of the tests 
were not available, and since most of 
the tests were made up of separately 
timed units of approximately equal 
difficulty, odd-even estimates of reli- 
ability were computed, using the 
Spearman-Brown formula. Table 2 
shows that the reliabilities for this 
sample compare favorably with Guil- 
ford’s sample, except in the case of 
Immediate (low quality) Conse- 
quences. 

Table 3 shows the Junior High 
sample broken down into the two grade 


levels. As would be expected, the 
eighth grade group is slightly higher 
for all measures, but results are suf- 
ficiently similar to justify combining 
the two grades. 

Correlations were obtained for each 
pair of the three teachers who rated 
each grade. In spite of the efforts 
made to define the criterion, Table 4 
shows that teachers do not agree very 
well in rating students on degree of 
creativity. Only one correlation proved 
to be significantly different from zero. 
While it is evident that teachers are 
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using different criteria for their ratings, 
their judgments were pooled in order 
to broaden the bases upon which scores 
for creativity were estimated. 

In appraising creativity, sex dif- 
ferences may also be important. Table 
5 shows, however, that in this sample, 
only Brick Fluency showed a con- 
sistent difference in favor of the girls, 
the difference being significant beyond 
the .05 level for the total group. 

Raw scores on the creativity tests 
were correlated with each other, with 
Teacher Ratings and with IQs. Since 
no ratings were available for two Ss, 
all correlations involving Teacher Rat- 
ing were based on 112 Ss. Results 
are shown in Table 6, with the four 
tests purporting to measure Originality 
listed first. It can be seen that these 
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four not only intercorrelate positively 
with each other, but they are also 
positively correlated with IQ, and, in 
two cases, with Teacher Ratings. The 
three Ideational Fluency tests, on the 
other hand, show no significant rela- 
tionship either with Teacher Ratings 
or IQ. 

When the raw scores were changed 
to standard scores and combined into 
Originality and Ideational Fluency 
indices, as shown in Table 7, only 
Originality showed a significant corre- 
lation with Teacher Ratings, although 
the two indices were moderately cor- 
related with each other. When all 
scores were combined into a Creativity 
Index, no significant correlation re- 
sulted. 


TABLE 4 
CoRRELATIONS BETWEEN TEACHER RATINGS OF CREATIVITY 
Grade N | 
7-1 24 rac = —.14 reo = 
7-8 31 Tap = 31 TnE = .62** Tpe = .23 
8-3 28 Trg = .16 = .20 Toa = 13 
8-8 29 Tor = .34 Tac = .28 Tia = .34 
** Significant at the .01 level. 
TABLE 5 
Sex DIFFERENCES ON GUILFORD TESTS AND TEACHER RaTINGs 
Grade 7 Grade 8 Total 
Male Female Male Female Male Female 
Mean; SD | Mean| SD | Mean! SD | Mean; SD | Mean| SD | Mean| SD 
Consequences 
Remoteness score 5.6 | 3.7 | 6.7 | 3.8/| 9.0 | 4.3 | 8.2 4.4] 7.0] 4.3 | 7.6] 4.2 
Low Quality score (13.4 | 4.9 |13.3 | 3.8 |13.6 | 4.4 /13.3 | 3.8 |13.5 | 4.7 |13.3 | 3.8 
Plot Titles | 
Cleverness score 2.8 2.3 | 3.3 | 3.2) 4.6 3.3 | 2.9 | 2.6] 3.8 3.0 
Low Quality score (15.6 | 6.5 |20.5 6.5 [21.6 | 8.2 |20.0 | 8.7 |18.0 | 7.8 |20.0 | 7.8 
Brick Fluency 16.7 | 4.8 |19.8 | 5.5 |18.8 | 5.9 |20.1 | 7.6 {17.6 | 5.3 |20.0*| 6.8 
Quick Responses 80.4 |17.6 |84.6 |28.6 |92.7 (20.4 |87.4 |25.0 |85.4 |86.2 |26.5 
Unusual Uses 9.7 | 3.4 9.2 | 3.0 |11.4 | 4.1 |12.4 | 3.9 |10.4 3.8 |11.0 | 3.9 
Teacher Ratings 6.0 | 1.7 1.5 | 6.6} 1.6 | 6.1 | 1.9 | 6.3 | 1.7} 6.1/1.8 


* Significantly different at .05 level. 
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TABLE 6 
CoRRELATIONS OF CREATIVITY TESTS wiITH Eacu OTHER, 
WITH TEACHER RaTINGs, AND WITH IQs 
| Quick | .| Plot T | Brick | Conseg. | PlotT | Teacher | 
Tests | Resp. | Remote | | Ratings 1Q 
Unusual Uses | .26°* | .45°* | .36%* | .36°* | | .28%* | 07 .32°° 
Quick Responses .28** | .49** | .03 Ol | .05 .29°* 
Conseq. Rem. .40** | .42** | —.12 a 05 .31** 
Plot Titles Clever .25°* i .28°* .34** 
Brick Fluency .37** | .46°* | —.21 01 
Conseq. Immed. | .24 —.002 | —.04 
Plot T. Nonclever 14 — .06 
Teacher Ratings | .20 


** Significant at the .01 level. 


TABLE 7 
CORRELATION OF STANDARD ToTAL Scores 
on GUILFORD Factors WITH 
TEACHER RATINGS 


(N = 112) 
| Originality |Teacher Ratings 
Fluency | 22° | — .02 
Originality .23° 
Creativity | | 16 


* Significant at the .05 level. 


DIscuSSION 


One of the questions to be investi- 
gated was the degree to which Guil- 
ford’s tests were suitable for a younger 
age group. Table 2 showed that the 
means and dispersions were very sim- 
ilar to the Air Cadet sample, and the 
reliabilities were, in general, of the 
same order. It would seem, therefore, 
that the tests, even in their present 
experimental form, are appropriate for 
research with bright junior high school 
students. 

The second question had to do with 
sex and grade differences. In general, 
it appears that these tests do not dis- 
criminate markedly between the sexes. 
On only one test (Brick Fluency) was 
the difference consistently and sig- 
nificantly in favor of the girls. Mean 
grade scores showed small absolute 
differences in favor of the eighth grade 


on all tests. Since age variability within 
each grade was very small, and the 
mean difference between grades 11.7 
months, the differences might be at- 
tributed to age, especially since Guil- 
ford’s older Air Cadets showed higher 
scores on all tests reported except Low 
Quality Plot Titles. Certainly there 
appears to be an increase with age on 
all the Originality tests. However, the 
discrepancy in IQs between seventh 
and eighth grades must also be con- 
sidered as a possible contributing fac- 
tor, especially since the Originality 
tests were all positively related to IQ. 

The third question concerned the 
empirical validation of the tests by 
means of Teacher Ratings. Results 
were disappointing and illustrative of 
the difficulties involved in this area. 
Partly due to the restriction of range 
within the sample, but also undoubt- 
edly due to the vagueness and variabil- 
ity in the popular conception of what 
is meant by creativity, correlations 
between the three teacher ratings for 
each S were low. While pooled ratings 
represent heterogeneous points of view 
regarding creativity, they did correlate 
significantly with two of the Originality 
tests. 

It would seem that we can look only 
for gross discriminations in teacher 
ratings of creativity at this age, if be- 
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havior in general is to be considered. 
This is not too surprising when we 
remember that teacher ratings of in- 
telligence, originally used as a criterion 


. for validating the Binet-Simon tests, 


are now considered to be less accurate 
than intelligence tests themselves. Val- 
idation of the Guilford tests may have 
to await longitudinal or follow-up stud- 
ies in which scores are correlated with 
actual creative productivity in adult 
life, unless better methods can be de- 
vised of rating present performance 
or products. 

As Guilford has hypothesized,' the 
tests which have factor loadings on 
Originality appear to be more satis- 
factory in the identification of crea- 
tive talent than do those which have 
loadings on Ideational Fluency. The 
positive but only moderately high cor- 
relations of Originality with IQ also 
confirm his statement that creativity is 
more than intelligence. Getzels and 
Jackson (1958) have also found this 
to be true. 

While the Guilford tests seem to be 
a promising method of identification 
of creative talent, a good deal more 
research is needed on these and other 
current instruments to establish both 
their concurrent and their predictive 
validity. 

SuMMARY 

Seven of Guilford’s creativity tests, 
four with factor loadings on Originality 
and three with factor loadings on 
Ideational Fluency, were administered 
to 114 seventh and eighth grade stu- 
dents of above average intelligence and 
school achievement, and results were 
compared with teacher ratings of their 
creativity. Dispersions and reliabilities 
were found to be comparable to those 


communication to senior 


1 Personal 
author. 


of Guilford’s Air Cadets, with some 
suggestion that there is an increase in 
mean score with age. Tests of Original- 
ity correlated more highly with IQ 
and with Teacher Ratings than did 
tests of Ideational Fluency. Teacher 
ratings of creativity proved to be a 
not very consistent criterion for val- 
idating the tests, due both to the re- 
stricted range and to vagueness of the 
concept of creativity. Sex was not a 
significant variable on most of the 
measures. 
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THE INFLUENCE OF SCHOOL CAMPING ON THE SELF- 
CONCEPTS AND SOCIAL RELATIONSHIPS OF 
SIXTH GRADE SCHOOL CHILDREN! 


JEROME BEKER 
Teachers College, Columbia University 


Many public and private school sys- 
tems across the country have intro- 
duced school camping programs, held 
during the regular school year, as part 
of the regular curriculum. The research 
reported here was undertaken to test 
whether the social and emotional 
growth of school campers over a given 
period of time could be shown to exceed 
that of an otherwise equivalent group 
of school children who had not had a 
school camping experience. It was hy- 
pothesized that the kinds of growth 
being studied can be stimulated in and 
by a social climate that makes it possi- 
ble for children to exert initiative and 
self-determination within a context of 
social awareness and clear limits, and 
with the assistance of sensitive, under- 
standing but not constricting adult 
guidance and leadership. In addition, 
it was hypothesized that school camp- 
ing, because of the very nature of the 
situation, tends to (but, of course, need 
not) provide this kind of social climate. 


METHOD 


Subjects. The members of 17, sixth grade, 
Long Island public school classes, predom- 
inantly from middle and lower middle class 
suburban homes, served as Ss in the re- 
search. Thirteen of these classes partici- 
pated in a total of seven school encamp- 
ments during the period of the study, while 
the members of the other four classes served 
as a control group. The control classes were 


1 The study reported here was part of a 
doctoral project submitted by the author at 
Teachers College, Columbia University; the 
research was conducted at New York Uni- 
versity Camp in Sloatsburg, New York. The 
author is presently a Research Psychologist 
at Berkshire Farm for Boys, Canaan, New 
York. 
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from some of the same schools as those which 
went to camp, and were scheduled to par- 
ticipate in school camping 4-6 months later 
during the same school year. All classes in- 
volved were heterogeneous with regard to 
children’s intelligence and socioeconomic 
level. The scheduling of when each class was 
to go to camp was determined by the two 
school systems involved, primarily by 
chance except in a few instances when teach- 
ers’ preferences were taken into account. 
There was some attrition of Ss due to ab- 
sences on days when tests were adminis- 
tered, and other factors. The final data on 
self-concepts were based on 261 Ss who had 
participated in school camping, and 96 con- 
trols; slightly fewer Ss were involved in the 
social relationships part of the study, since 
a number of Ss who did not complete the 
Social Distance Scale properly were elimi- 
nated. The groups were approximately 
evenly divided between the sexes. 

Research Instruments.? A 47-item check 
list was developed as a means of studying 
the self-concepts of the Ss at various stages 
of the research. Most of the items were orig- 
inal, and the others were chosen from about 
100 items being used in other studies. Face 
validity was determined by the judgments 
of three trained psychologists so as to in- 
clude items related to a variety of aspects 
of the self-concept. Items were approxi- 
mately evenly divided between those con- 
cerned primarily with feelings of compe- 
tence in individual concerns, and those 
related primarily to feelings of competence 
as a social being. 

Ss indicated for each statement whether: 
“This is very much like me,”’ ‘‘A little bit 
like me,’’ or ‘‘Not like me at all.’”’ Twenty- 


2 Copies of the instruments used, as well 
as full details on other aspects of the re- 
search, are available in the original project 
report at the Teachers College Library, 
Columbia University, New York 27, New 
York. The title of the document is ‘‘The Re- 
lationship between School Camp Social Cli- 
mate and Changes in Children’s Self-Con- 
cepts and Patterns of Social Relationship,”’ 
by Jerome Beker, 1959. 
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SCHOOL CAMPING 


six of the items were worded in a ‘‘positive”’ 
way, e.g., “I can usually trust my judg- 
ment,’’ and 21 items negatively, e.g., “‘I get 
upset too easily.’’ The instrument was pre- 
tested on a group comparable to the Ss 
themselves. The task seemed clearly com- 
prehensible to these children, and their in- 
terest was maintained throughout. Illustra- 
tive items follow: 

I enjoy accepting responsibility. 

I expect to be a success some day. 

I find it hard to get to know people well. 

People expect too much of me. 

Social relationships were evaluated 
through the use of the Classroom Social Dis- 
tance Scale* (Cunningham, 1951). On this 
check list, Ss indicated one of five categories 
for each of their classmates, as follows: 
‘“‘Would like to have him as one of my best 
friends,’ ‘‘Would like to have him in my 
group but not as a close friend,’ ‘‘Would 
like to be with him once in a while but not 
often or for long at a time,’’ ‘‘Don’t mind 
his being in our room but I don’t want to 
have anything to do with him,” or, ‘“‘Wish 
he weren’t in our room.”’ This instrument 
was administered to the Ss together with the 
check list described above. To encourage 
honesty on all instruments used, the Ss were 
assured that their responses would be seen 
only by the investigator. 

A 20-item camp evaluation check list was 
developed to compare the encampments as 
they were perceived by the participants. 
The items were based on the expressed ob- 
jectives of the participating schools. Each 
item stated a feeling about or an opinion of 
some aspect of the camp experience. Camp- 
ers were asked to indicate whether the item 
expressed the way they felt ‘Almost al- 
ways,’ “‘Sometimes,’’ or ‘“‘Almost never’’ 
during the camp period. Illustrative items 
follow: 

I felt like helping when my help was 
needed at camp. 

I felt that I had a real part in planning 
the trip. 

I felt afraid of the teachers and counse- 
lors. 

The encampments were also rated by two 
independent adult observers on a five-point 
scale for each of four specific variables. 

Research Design. The self-concept check 
list and the Classroom Social Distance Scale 
were administered to each S three times, at 


3 Used with permission of the Horace 
Mann-Lincoln Institute of School Experi- 
mentation, Teachers College, Columbia 
University. 
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school: on the Friday before he left for 
camp, on the Monday following his return 
(an interval of 10 days), and between 10 
weeks and 3 months later. The same pattern 
was followed for the control Ss, except that 
they spent the week between the first and 
second administrations in their regular 
classroom program. Thus any effect due 
merely to the passage of time or to the repe- 
tition of the instrument itself would have 
been roughly equivalent in the two groups. 
Neither the teachers of the classes involved, 
who administered the instruments, nor the 
Ss themselves were informed of the exact 
purpose of the study, although most were 
aware that it was related to the school camp- 
ing program. 

Statistical considerations made it neces- 
sary to compare the responses made by each 
S on each item on the three administrations 
of the self-concept check list. A changed re- 
sponse by an S on any item was recorded as 
positive or negative. For example, on an 
item like “I can usually trust my judg- 
ment,” a given S might have checked ‘“‘Not 
like me at all’’ on the first administration. 
If he checked ‘‘A little bit like me”’ or ‘‘This 
is very much like me’”’ on the second admin- 
istration, this would be scored as a positive 
shift on that item. In each case, responses 
on the second and third administrations 
were compared with those on the first. The 
same procedure was followed for the Class- 
room Social Distance Scale, using each S’s 
Self Social Distance scores‘ on the three ad- 
ministrations. 

The statistics of binomial probability 
were applied to the data to determine the 
statistical significance of the proportion of 
positive shifts to negative shifts noted on 
each item. The .05 level of confidence was 
adopted for this purpose. Thus it was possi- 
ble to compare the camper group with the 
control group on the basis of the number of 
self-concept items that showed statistically 
significant positive or negative shifts from 
the first administration to the second, and 
from the first to the third. Limited compari- 
sons could be based on individual items. The 
shifts in Ss’ Social Distance scores were also 
compared on the basis of the proportion of 
positive and negative shifts. 

The evaluations and ratings of the en- 
campments themselves were introduced in 
an effort to provide some insight into what 
it was about the camp experience that pro- 


‘4 This is a figure indicating each subject’s 
acceptance of his classmates. For details, 
see Cunningham (1951). 
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duced whatever significant differences were 
to appear between the campers and the con- 
trol group. For this purpose, it was hypothe- 
sized that the camp “‘social climate,’’ viewed 
in terms of several predetermined variables, 
would be related to changes in campers’ self- 
concepts and Social Distance scores. 

The camp evaluation check list men- 
tioned above provided one of the bases on 
which the encampments were compared. 
The items were so phrased that the ‘‘posi- 
tive’’ or ‘“‘negative”’ nature of a response was 
clearly evident. For example, on the item 
“TI felt bored or confused, and didn’t know 
what I could do,”’ the response ‘‘Almost al- 
ways’’ was considered to be a negative one. 
Thus the expectation was that an encamp- 
ment with an overall positive rating (in this 
sense) relative to the other encampments 
would produce a greater positive change in 
campers’ self-concepts and Social Distance 
scores. Each item was given equal weight. 

The two adult observers (the camp direc- 
tor and the investigator) were not directly 
involved in the program but were on the 
scene during each of the seven encamp- 
ments. They rated each encampment on the 
following: (a) How ‘‘democratic’’ were the 
control or discipline patterns used by the 
leadership? e.g., was respect shown for the 
rights and dignity of campers as people? (6) 
How involved were the adult leaders with 
the campers? How interested were the lead- 
ers in the program? Did they care, or were 
they just doing a job? (c) How involved were 
the campers in planning, problem solving, 
decision making, and other ‘‘executive proc- 
esses’ in the camp community? (d) What 
was the overall feeling tone? i.e., how re- 
laxed, friendly, enthusiastic, cooperative 
was the group? It was hypothesized that the 
more positive the ratings of a given encamp- 
ment on the sum of these four variables, the 
more positive influence would tend to be ex- 
erted on campers’ self-concepts and Social 
Distance scores. Here again, composite rat- 
ings were developed by arbitrarily assigning 
equal weight to each of the four variables. 
Comparisons were made between the inde- 
pendent ratings made by the two adult ob- 
servers, and between the ratings by the 
adults and the evaluations by the children. 


RESULTS 


Self-Concept. The changes noted on 
the second administration of the self- 
concept check list, immediately after 
the experimental (camp) period, mark- 
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edly favored the campers over the con- 
trol group. There were significant posi- 
tive shifts on many more items by 
campers of both sexes than by the con- 
trols. In general, the experimental 
group shifted on all items on which the 
control group shifted, and on numerous 
others as well. Although an increased 
number of items showing significant 
positive shifts appeared for both groups 
on the third administration, the dif- 
ference between the groups was even 
greater and in the same direction. The 
data are summarized in Table 1. The 
five items on which the experimental 
group showed the greatest positive 
change relative to the change shown 
by the control group on the second and 
third administrations follow: I am a 
dependable person, I have trouble 
making up my mind, I get upset too 
easily, I worry about what others think 
of me, and I have some outstanding 
abilities. Thus it seemed apparent 
that, as a group, the children who had 
gone to camp experienced increased 
feelings of competence as people to an 
extent that was not matched by chil- 
dren who had not gone. The effect was 
not a transient one, but was evident in 
even greater magnitude after a lapse of 
more than 10 weeks. 

Social Relationships. In the experi- 
mental group, a statistically significant 
proportion of the shifts in Social Dis- 
tance scores was positive. This was not 
true for the control group, although the 
two proportions were not significantly 
different from each other. The pro- 
portion of positive changes on the third 
administration was slightly lower for 
both groups, and still significant only 
in the case of the camper group. Again, 
the proportions were not significantly 


5 A “‘positive”’ shift is a shift in the direc- 
tion of increased feelings of adequacy, of 
course, and not necessarily in the direction 
of increased agreement with what a given 
item states. 
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different from each other. These re- 
sults suggest that school camping did 
have some positive influence on camp- 
ers’ Self Social Distance scores, but 
the differences between the groups 
seem too tenuous to serve as the basis 
for any more certain conclusions. 
These data are summarized in Table 2. 

Social Climate. Positive correlations 
were found between the ratings of the 
encampments made by the two ob- 
servers on each of the four variables 
discussed above, as follows: (a) control 


or discipline patterns, r = 0.38; (0b) 
involvement of leaders, r = 0.55; (c) 
involvement of campers, r = 0.65; and 
(d) overall feeling tone, r = 0.19. None 
of these correlations is statistically 
significant, however, for the small num- 
ber of ratings involved, seven by each 
observer for each variable. The seven 
encampments were also arranged in 
rank order according to the totals of 
the ratings by each observer. The cor- 
relation coefficient of these two rank 
orders is equal to 0.71, but this figure 


TABLE 1 
NumBer or Irems SHOWING SIGNIFICANT SHIFTS ON SELF-ConcepT CuHEcK List 
(.05 level of significance; signs indicate direction of shifts) 


Number of items showing significant shifts by 


Shifts Entire group, |N items show- 
Boys Girls irrespective | ing no signif. 
of sex shift 
First to second administration: 
Exper. group* 15+ 11+ 22+ 23¢ 
Control group» 3+ 3+ 4+,1-— 42 
Reliability of the difference p< Ol p < 05 p< 001 | p < .001 
First to third administration: 
Exper. group* 22+ 19+ 35+ ll 
Control group> 6+ 4+ 8+ 36 
Reliability of the difference p < 001 | p < 001 | p < 001 | p < 001 


Note.—The difference from the second to the third administration shown by the experimental group was reliable 
at the .01 level for the number of items showing significant shifts by the entire group and for the number showing no 
significant shift. It was not reliable at the .05 level for either sex alone in the experimental group, or for any of the four 


categories in the control group. 
®N = 21. 
>N = 96. 


© The check list consisted of 47 items. The rows total more than 47 because some items shifted in more than one of 


the three categories listed. 


TABLE 2 
Suirts Socrat Distance Scores 
(Only shifts of .05 or more in Social Distance scores are included) 


First to second administrations First to third administrations 
Cc 1 E 1 
E x t > Contr 
| | | Cartes” 
Total number 219 76 232 77 
Positive proportion .68 -61 61 -52 
Significance level 05 ns 05 ns 


Note.—The differences between the proportions of the experimental and control groups were not reliable on either 
the second or the third administrations. 
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is not statistically significant. The 
ratings of both adult observers were 
then combined to provide a composite 
rank order of the social climates of the 
seven encampments. 

The camp evaluation check lists 
completed by the campers were used as 
the basis for another ranking of the 
encampments. For this purpose, each 
encampment was rated according to 
the sum of the net percentage of posi- 
tive responses on each item.* The cor- 
relation of this rank order with that of 
the composite of the adult ratings is 
0.93, which is significant at the .01 
level even for as small a number of 
cases as is involved here. 

The three encampments rated most 
highly both by the campers and by the 
adult observers were compared with 
the three rated least highly. (The en- 
campment which appeared in the 
middle of both rank orders was omitted 
for the purpose of this comparison.) 
The differences between these two 
groups in changes in campers’ self- 
concepts and social relationships on 
either the second or third administra- 
tion were not reliable. 


DISCUSSION 


The results suggest that school 
camping can have a marked positive 
impact on children’s self-concepts and, 
perhaps, on their social relationships 
as well. The precise nature and depth 
of this influence and its specific deter- 
minants, however, remain obscure.’ It 
is suggested that future research on the 


* For example, assume that 60% of the 
campers on a given encampment gave the 
positive response on a given item, 30% gave 
the neutral response, and 10% gave the 
negative response. A “‘score’’ of 50 would be 
credited to that encampment on that item. 
The rank order was determined on the basis 
of the sum of these ‘‘scores”’ foreach encamp- 
ment. 

7 This problem is discussed at length in 
the complete project report (see Footnote 
2). 
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determinants consider the variable of 
program content, in addition to those 
discussed above. If the specific ele- 
ments in the school camping experi- 
ence and climate that tend to promote 
camper growth can be identified, it 
may be possible to apply them in the 
classroom and elsewhere, as well as in 
camping itself. Thus, an understanding 
of the impact of school camping may 
suggest ways of increasing the potency 
of a variety of educational settings. 


SUMMARY 


A self-concept check list and the 
Classroom Social Distance Scale were 
used to evaluate emotional and social 
growth of 13 school classes of sixth 
graders participating in five-day school 
camping programs as part of their 
regular school curriculum. Four non- 
participating classes provided control 
subjects. Campers, using a check list, 
and adult observers rated the “social 
climate” of each encampment. Rank 
order ratings by the adults and children 
correlated closely. Significant and 
marked positive changes in self-con- 
cepts were shown by the campers. The 
control group did not reflect these 
changes. The differences were even 
greater after a lapse of 10 weeks than 
immediately after the camp experi- 
ence, suggesting the continuation of 
growth processes started at camp. 
There also seemed to be a slight posi- 
tive influence on campers’ social rela- 
tionships, but the gain was not reliably 
greater than that of the control group. 
The results were not related to the 
“social climates” of the encampments, 
as rated by adult observers and the 
campers themselves. 
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A complete library of psychology 


in one volume— 


CLASSICS IN 
PSYCHOLOGY 


Edited by Thorne Shipley 


In a text of over 1400 pages, are the works of 35 leading psychologists of 
Europe, the United States, and the Soviet Union, spanning a period of 


one hundred and fifty years. 


Some of the chapters: 


PHILIPPE PINEL: 
Treatise on Insanity 
JEAN MARTIN CHARCOT: 
Diseases of the Nervous System 
WILHELM WUNDT: 
Sensory Perception 
ERNST MACH: 

The Relation of the Physical to 
the Psychical 
WILLIAM JAMES: 
Principles of Psychology 
EMIL KRAEPELIN: 
Clinical Psychiatry 
E. B. TITCHENER: 
Structural Psychology 
BENJAMIN RUSH: 
The Diseases of the Mind 
MANFRED SAKEL: 


1. P. PAVLOV: 
Conditioned Reflexes 
JOHN B. WATSON: 
Behaviorist Views 
HERMANN RORSCHACH: 
Psychodiagnostics 
AUGUST AICHHORN: 
Wayward Youth 
G. STANLEY HALL: 
Adolescence 
JEAN PIAGET: 

The Language of the 


KURT KOFFKA: 
Perception 
ISAAC RAY: 
Jurisprudence of Insanity 
WILLIAM McDOUGALL: 
Social Psychology 


Shock Treatment of Schizophrenia JOSEF BREUER & 
EUGEN BLEULER: SIGMUND FREUD: 
Dementia Praecox Studies on Hysteria 
ALFRED ADLER: Cc. G. JUNG: 
Individual Psychology Analytical Psychology 
$20.00 


You can expedite shipment by enclosing remittance. 
PHILOSOPHICAL LIBRARY, Publishers 
15 East 40th Street, New York 16, N. Y. 
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PSYCHOLOGICAL 
MONOGRAPHS 


No. 483. Personality, Intellectual, and 
Achievement Patterns in Gifted 
Children 
By Apa D’HEURLE, JEANNE MELLINGER, 
and Ernest HaGGarD 
Price $1.00 


. Imitative Behavior in Preschool 
Children 


By Joun W. McDavip 
Price $1.00 


Order from: 
American Psychological Association 
1333 16th St., NW, Washington 6, D. C. 
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ELEMENTARY STATISTICAL METHODS 
IN PSYCHOLOGY AND EDUCATION 


Paul Blommers and E. F. Lindquist, 

both of the State University of Iowa 

528 pages 1960 $5.75 
STUDY MANUAL fo accompany ELEMENTARY 
STATISTICAL METHODS IN PSYCHOLOGY 
AND EDUCATION 
247 pages 1960 $2.00 
INSTRUCTOR’S MANUAL and KEY for the 
STUDY MANUAL to accompany ELEMENTARY 
STATISTICAL METHODS IN PSYCHOLOGY 
AND EDUCATION 


COUNSELING FOR PERSONAL ADJUSTMENT IN 


SCHOOLS AND COLLEGES 


Fred McKinney 

548 Pages 1958 $6.50 

HOUGHTON MIFFLIN COMPANY + _ Boston 
New York Atlanta Geneva Dallas _ Palo Alto 
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Again Available 


SCHOOL PSYCHOLOGISTS 
AT MID-CENTURY 


Edited by 
NORMA E. CUTTS 


The Report of the Thayer Conference, published in 1955, covered 
an intensive study of the development of school psychology. 
Because of continuing demand, the Report has been reprinted. 


230 pages. Price $2.75 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
1333 Sixteenth Street, N.W. 
Washington 6, D. C. 
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